Distributed System
分布式系统期末总结
分布式系统模型
什么是分布式系统
A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system.
分布式系统是一组自主计算的集合,在用户看来是一个单一的一致系统
分布式系统的目标
- Making resources available: 使资源多处可用
- Distribution transparency: 分布式透明性
- Openness: 开放性
- Scalability: 可扩展性
为什么要分布式?
Because people are distributed
方面 | 原因 |
---|---|
经济 | 多个微处理器提供了比主机更好的性价比 |
速度 | 分布式系统可以拥有更多的算力 |
继承的分布式 | 一些应用本身便涉及到空间上分散的机器 |
可信性 | 如果单个机器损坏, 整个系统仍可以存活 |
可增加的成长 | 算力可以一点点地增加 |
分布式系统透明性和开放性的含义。
透明性
隐藏底层的资源获取方式、资源位置、资源移动、重定位、复制、并发性、资源错误与恢复
- 想要实现完全的透明可能比较困难:
- 用户可能被定位在不同的组件
- 完全隐藏网络和节点的错误是(理论与实践上)不可能的
- 难以分辨速度较慢与网络错误
- 难以确定服务器在崩溃之前是否已经进行了某些操作
- 完全的透明性将会影响性能,暴露系统的分布式环境
- 保存Web缓存与服务器同步
- 将写操作立即写入磁盘以免导致错误
开放性
- 能够与其他开放系统的服务进行交互,不管底层的环境如何
- 系统应该定义良好的接口
- 系统应该支持应用的迁移
- 系统应该容易进行操作(interoperate)
- 至少使分布式系统独立于底层的运行环境
分布式操作系统、网络操作系统和基于中间件的系统。
// TODO
分布式系统的类型。
- 分布式计算系统:Distributed computing systems
- 集群计算:Cluster computing
- a group of high-end systems connected through a LAN
- Homogeneous: same OS, near-identical hardware
- 网格计算:Grid Computing
- lots of nodes from everywhere
- Heterogeneous
- Dispersed across several organizations
- Can easily span a wide-area network
- 云计算
- 集群计算:Cluster computing
- 分布式信息系统:Distributed information systems
- 当今使用最广泛的分布式系统是各种各样的传统信息系统
- Example: 交易处理系统.
- 交易是在一组对象上进行的一系列操作,满足ACID:
- Atomicity
- Consistency
- Isolation
- Durability
- 通常,交易涉及的数据会分布在不同的服务器上。TP控制器用于管理交易的执行。
- Distributed pervasive systems
- Emerging next-generation of distributed systems in which nodes are small, mobile, and often embedded in a larger system, characterized by the fact that the system naturally blends into the user’s environment.
- Ubiquitous computing systems
- Mobile computing systems
- Sensor (and actuator) networks
分布式系统架构
分布式系统架构风格
Centralized
- Basic Client–Server Model
- Characteristics:
- There are processes offering services (servers)
- There are processes that use services (clients)
- Clients and servers can be on different machines
- Clients follow request/reply model wrt to using services
- Characteristics:
- Multiple Clients/Single Server
- Web proxy server
- Web Applets
- Server forms bottleneck
- Server forms single point of failure
- System scaling difficult
- Multiple Clients/Multiple Servers
- Web proxy server
- Web Applets
Application Layering
- Traditional three-layered view
- User-interface layer contains units for an application’s user interface
- Processing layer contains the functions of an application, i.e. without specific data
- Data layer contains the data that a client wants to manipulate through the application components
- This layering is found in many distributed information systems, using traditional database technology and accompanying applications.
Multitiered Architectures (1)
- Single-tiered: dumb terminal/mainframe configuration
- Two-tiered: client/single server configuration
- Three-tiered: each layer on separate machine
Decentralized
- Structured P2P: 节点按照特定的分布式数据结构进行组织,可以直接按照节点ID进行查找
- Unstructured P2P: 节点随机的选择邻居,使用(有限/概率)洪泛法进行查找
- Hybrid P2P: 某些节点(Superpeer)具有特定的功能
Hybrid
- Client-server combined with P2P
- Edge-server architectures, which are often used for Content
Delivery Networks.
分布式系统组织形式
Multitiered Architectures (1)
- Single-tiered: dumb terminal/mainframe configuration
- Two-tiered: client/single server configuration
- Three-tiered: each layer on separate machine
客户-服务器模式和对等模式
分布式系统组织为中间件
Middleware
- In many cases, distributed systems/applications are developed according to a specific architectural style. The chosen style may not be optimal in all cases
- need to (dynamically) adapt the behavior of the middleware.
- Interceptors
- Intercept the usual flow of control when invoking a remote
object.
进程与线程
进程和线程
Process: An execution stream in the context of a process
state
- Execution stream
• Stream of executing instructions
• Running piece of code
• Sequential sequence of instructions
• “thread of control” - Process state
- Everything that the running code can affect or be affected by
Processes vs. Threads
-
A process is different than a thread
-
Thread: “Lightweight process” (LWP)
- An execution stream that shares an address space
- Multiple threads within a single process
-
Example:
- Two processes examining memory address 0xffe84264 see different values (I.e., different contents)
- Two threads examining memory address 0xffe84264 see same value (I.e., same contents)
代码迁移
- Approaches to code migration
- Migration and local resources
- Resource types
- Fixed: the resource cannot be migrated, such as local hardware
- Fastened: the resource can, in principle, be migrated but only at high cost
- Unattached: the resource can easily be moved along with the object (e.g. a cache)
- Object-to-resource binding
- By identifier: the object requires a specific instance of a resource (e.g. a specific database)
- By value: the object requires the value of a resource (e.g. the set of cache entries)
- By type: the object requires that only a type of resource is available (e.g. a color monitor)
- Resource types
- Migration in heterogeneous systems
- Main problem
- The target machine may not be suitable to execute the migrated code
- The definition of process/thread/processor context is highly dependent on local hardware, operating system and runtime system
- Make use of an abstract machine that is implemented on different platforms
- Interpreted languages, effectively having their own VM
- Virtual VM (as discussed previously)
- Main problem
强迁移vs.弱迁移
Strong and weak mobility
-
Code segment: contains the actual code
-
Data segment: contains the state
-
Execution state: contains context of thread executing the object’s code
-
Move only code and data segment (and reboot execution):
- Relatively simple, especially if code is portable
- Distinguish code shipping (push) from code fetching (pull)
-
Move component, including execution state
- Migration: move entire object from one machine to the other
- Cloning: start a clone, and set it in the same execution state
通信
通信的类型
- Transient vs. persistent communication
- Transient communication: Comm. server discards message when it cannot be delivered at the next server, or at the receiver.
- Persistent communication: A message is stored at a communication server as long as it takes to deliver it.
- Asynchrounous vs. synchronous communication
- At request submission
- At request delivery
- After request processing
Client/Server
- Client/Server computing is generally based on a model of
transient synchronous communication:- Client and server have to be active at time of communication
- Client issues request and blocks until it receives reply
- Server essentially waits only for incoming requests, and subsequently processes them
- Drawbacks of synchronous communication
- Client cannot do any other work while waiting for reply
- Failures have to be handled immediately: the client is waiting
- The model may simply not be appropriate (mail, news)
Messaging
- Aims at high-level persistent asynchronous communication:
- Processes send each other messages, which are queued
- Sender need not wait for immediate reply, but can do other
things - Middleware often ensures fault tolerance
远程过程调用RPC
RPC的工作过程
- Observations
- Application developers are familiar with simple procedure model
- Well-engineered procedures operate in isolation (black box)
- There is no fundamental reason not to execute procedures on separate machine
- Communication between caller & callee can be hidden by using procedurecall mechanism.
参数传递
- There’s more than just wrapping parameters into a message:
- Client and server machines may have different data representations (think of byte ordering)
- Wrapping a parameter means transforming a value into a sequence of bytes
- Client and server have to agree on the same encoding:
- How are basic data values represented (integers, floats, characters)
- How are complex data values represented (arrays, unions)
- Client and server need to properly interpret messages, transforming them into machine-dependent representations.
- Some assumptions:
- Copy in/copy out semantics: while procedure is executed, nothing can be assumed about parameter values.
- All data that is to be operated on is passed by parameters. Excludes passing references to (global) data.
- Full access transparency cannot be realized.
- A remote reference mechanism enhances access transparency:
- Remote reference offers unified access to remote data
- Remote references can be passed as parameter in RPCs
故障处理
// TODO
- 客户端无法定位到服务器
- 服务器下线、服务器与客户端版本不一致
- 使用特定的符号作为RPC的返回值
- 抛出一个异常或者信号
- 请求信息丢失
- 发送信息时使用计时器
- 计时器到期则重传
- 多次丢失:认为服务器下线,转到"无法定位到服务器"
- 响应信息丢失
- 发送信息时使用计时器
- 使用序列号编码请求信息,以免导致多次运行
- 服务器崩溃
- 等待服务器重启,重新操作,保证RPC至少被运行一次
- 立刻放弃并报告错误,保证RPC至多被运行一次
- 什么都不做,不保证任何情况
- 客户端崩溃
- 使用日志记录每个RPC
- 按时间进行分片
- 为每个RPC记录一个过期时间
动态绑定
-
To register, the server gives the binder its name, its version number, a unique identifier (32-bits), and a handle used to locate it
-
The handle is system dependent (e.g., Ethernet address, IP address, an X.500 address, …)
-
When the client calls one of the remote procedures for the first time, say, read:
- The client stub sees that it is not yet bound to a server, so it sends a message to the binder asking to import version x of server interface
- The binder checks to see if one or more servers have already exported an interface with this name and version number.
- If no currently running server is willing to support this interface, the read call fails
- If a suitable server exists, the binder gives its handle and unique identifier to the client stub
Advantages of Dynamic Binding
- Flexibility
- Can support multiple servers that support the same interface, e.g.:
- Binder can spread the clients randomly over the servers to even load
- Binder can poll the servers periodically, automatically deregistering the servers that fail to respond, to achieve a degree of fault tolerance
- Binder can assist in authentication: For example, a server specifies a list of users that can use it; the binder will refuse to tell users not on the list about the server
- The binder can verify that both client and server are using the same version of the interface
Disadvantages of Dynamic Binding
- The extra overhead of exporting/importing interfaces costs time
- The binder may become a bottleneck in a large distributed system
基于消息的通信
- Lower-level interface to provide more flexibility
- Two (abstract) primitives are used to implement these
- Send
- Receive
- Issues:
- Are primitives blocking or nonblocking (synchronous or
asynchronous)? - Are primitives reliable or unreliable (persistent or transient)?
持久性/非持久性
- Transient
- The sender puts the message on the net and if it cannot be
delivered to the sender or to the next communication host, it is
lost. - There can be different types depending on whether it is
asynchronous or synchronous - Persistent
- The message is stored in the communication system as long as it
takes to deliver the message to the receiver
同步/异步
- Synchronous
- The sender is blocked until its message is stored in the local
buffer at the receiving host or delivered to the receiver. - Asynchronous
- The sender continues immediately after executing a send
- The message is stored in the local buffer at the sending host or
at the first communication server.
流数据
- A (continuous) data stream is a connection-oriented communication facility that supports isochronous data transmission.
- Some common stream characteristics
- Streams are unidirectional
- There is generally a single source, and one or more sinks
- Often, either the sink and/or source is a wrapper around hardware(e.g., camera, CD device, TV monitor)
- Simple stream: a single flow of data, e.g., audio or video
- Complex stream: multiple data flows, e.g., stereo audio or combination audio/video
同步与资源管理
同步问题
- How processes cooperate and synchronize with one another in a distributed system
- In single CPU systems, critical regions, mutual exclusion, and other synchronization problems are solved using methods such as semaphores.
- These methods will not work in distributed systems because they implicitly rely on the existence of shared memory.
- If two events occur in a distributed system, it is difficult to determine which event occurred first.
- How to decide on relative ordering of events
- Does one event precede another event?
- Difficult to determine if events occur on different machines.
时钟同步机制
- In a centralized system:
- Time is unambiguous: A process gets the time by issuing a system call to the kernel. If process A gets the time and latter process B gets the time. The value B gets is higher than (or possibly equal to) the value A got
逻辑时钟
- How do we maintain a global view on the system’s behavior
that is consistent with the happened-before relation? - Attach a timestamp C(e) to each event e, satisfying the
following properties: - P1: If a and b are two events in the same process, and a!=b, then
we demand that C 𝑎𝑎 < 𝐶𝐶(𝑏𝑏). - P2: If a corresponds to sending a message m, and b to the
receipt of that message, then also C 𝑎𝑎 < 𝐶𝐶(𝑏𝑏). - How to attach a timestamp to an event when there’s no
global clock ⇒ maintain a consistent set of logical clocks,
one per process.
Logical vs Physical Clocks
- Clock synchronization need not be absolute! (due to Lamport, 1978):
- If two processes do not interact, their clocks need not be synchronized.
- What matters is not that all processes agree on exactly what time is it, but rather, that they agree on the order in which events occur.
- For algorithms where only internal consistency of clocks matters (not whether clocks are close to real time), we speak of logical clocks.
- For algorithms where clocks must not only be the same, but also must not deviate from real-time, we speak of physical clocks.
Lamport算法
- Each process 𝑃𝑃𝑖𝑖 maintains a local counter 𝐶𝐶𝑖𝑖 and adjusts this counter according to the following rules:
- For any two successive events that take place within , is incremented by 1.
- Each time a message m is sent by process , the message receives a timestamp .
- Whenever a message m is received by a process , adjusts its local counter to ; then executes step 1 before passing m to the application.
- Property P1 is satisfied by (1); Property P2 by (2) and (3).
- It can still occur that two events happen at the same time. Avoid this by breaking ties through process IDs.
向量时戳
分布式系统中的互斥访问
- Basic solutions
- Via a centralized server.
- Completely decentralized, using a peer-to-peer system.
- Completely distributed, with no topology imposed.
- Completely distributed along a (logical) ring.
分布式系统中的选举机制
复制与一致性
复制的优势与不足
- Why replicate?
- Reliability
- Avoid single points of failure
- Performance
- Scalability in numbers and geographic area
- Reliability
- Why not replicate?
- Replication transparency
- Consistency issues
- Updates are costly
- Availability may suffer if not careful
数据一致性模型
Client-Centric Consistency
- More relaxed form of consistency
- only concerned with replicas being eventually consistent (eventual consistency).
- In the absence of any further updates, all replicas converge to identical copies of each other ->only requires guarantees that updates will be propagated.
- Easy if a user always accesses the same replica; problematic if the user accesses different replicas.
- Client-centric consistency: guarantees for a single client the consistency of access to a data store.
数据一致性协议实例
- Linearizability
- Primary-Backup
- Chain Replication
- Sequential consistency
- Primary-based protocols
- Remote-Write protocols
- Local-Write protocols
- Replicated Write protocols
- Active replication
- Quorum-based protocols
- Primary-based protocols
基于法定数量的协议
容错
可信系统(DependableSystem)特征
- 可靠性:Reliability
- 系统的表现符合预期设定
- 系统在给定时间内不出现任何故障的概率
- Typically used to describe systems that cannot be repaired or where the continuous operation of the system is critical.
- 可用性:Availability
- The fraction of the time that a system meets its specification.
- The probability that the system is operational at a given time t.
- 安全性:Safety
- 当系统暂时地偏离了预期设定, 也不会造成较大事故
- 可维护性:Maintainability
- 用于衡量系统维护是否困难
提高系统可信性的途径
- 通过冗余掩饰系统异常
- 冗余信息
- 通过附加额外的比特检测和恢复传输错误
- 冗余时间
- 当一个交易中止后, 在不会造成负面影响的情况下重新执行它
- Physical redundancy
- 硬件冗余
- 使用多个服务器降低服务器失效风险
- 软件冗余
- 通过相同的方式多次运行
- 硬件冗余
- 冗余处理
- 处理组: 所有的成员可以相互接替工作, 动态调整, 扁平化的结构
- 冗余信息
K容错系统
- K 容错系统需要的备份数:
- Fail-silent faults : K+1
- Byzantine faults : 2K+1 : majority
- 组的结构: flat/hierarchical
- Need for managing groups and group membership
- centralized: group server
- distributed: totally-ordered reliable multicast
拜占庭问题(Byzantine Problem)
Algorithm to reach agreement. They perform the following :
- step 1: every general sends a message to every other general telling his strength ( true or lie)
- step 2: each general collects received messages to form a vector
- step 3: every general passes his vector to every other general
- step 4: each general examines the ith element of each of the newly received vectors.
- If any value has a majority, that value is put into the result vector
系统恢复
- 回退恢复
- 前向恢复
- 检查点(Check point)
Distributed System