Distributed System

2019-12-31发表2022-08-06更新NJU26 分钟读完 (大约3832个字)0次访问

分布式系统期末总结

分布式系统模型

什么是分布式系统

A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system.

分布式系统是一组自主计算的集合,在用户看来是一个单一的一致系统

分布式系统的目标

Making resources available: 使资源多处可用
Distribution transparency: 分布式透明性
Openness: 开放性
Scalability: 可扩展性

为什么要分布式？

Because people are distributed

方面	原因
经济	多个微处理器提供了比主机更好的性价比
速度	分布式系统可以拥有更多的算力
继承的分布式	一些应用本身便涉及到空间上分散的机器
可信性	如果单个机器损坏, 整个系统仍可以存活
可增加的成长	算力可以一点点地增加

分布式系统透明性和开放性的含义。

透明性

隐藏底层的资源获取方式、资源位置、资源移动、重定位、复制、并发性、资源错误与恢复

想要实现完全的透明可能比较困难:
- 用户可能被定位在不同的组件
- 完全隐藏网络和节点的错误是（理论与实践上）不可能的
  - 难以分辨速度较慢与网络错误
  - 难以确定服务器在崩溃之前是否已经进行了某些操作
- 完全的透明性将会影响性能，暴露系统的分布式环境
  - 保存Web缓存与服务器同步
  - 将写操作立即写入磁盘以免导致错误

开放性

能够与其他开放系统的服务进行交互，不管底层的环境如何
- 系统应该定义良好的接口
- 系统应该支持应用的迁移
- 系统应该容易进行操作（interoperate）
至少使分布式系统独立于底层的运行环境

分布式操作系统、网络操作系统和基于中间件的系统。

// TODO

分布式系统的类型。

分布式计算系统：Distributed computing systems
- 集群计算：Cluster computing
  - a group of high-end systems connected through a LAN
  - Homogeneous: same OS, near-identical hardware
- 网格计算：Grid Computing
  - lots of nodes from everywhere
  - Heterogeneous
  - Dispersed across several organizations
  - Can easily span a wide-area network
- 云计算
分布式信息系统：Distributed information systems
- 当今使用最广泛的分布式系统是各种各样的传统信息系统
- Example: 交易处理系统.
- 交易是在一组对象上进行的一系列操作，满足ACID：
  - Atomicity
  - Consistency
  - Isolation
  - Durability
- 通常，交易涉及的数据会分布在不同的服务器上。TP控制器用于管理交易的执行。
Distributed pervasive systems
- Emerging next-generation of distributed systems in which nodes are small, mobile, and often embedded in a larger system, characterized by the fact that the system naturally blends into the user’s environment.
- Ubiquitous computing systems
- Mobile computing systems
- Sensor (and actuator) networks

分布式系统架构

分布式系统架构风格

Centralized

Basic Client–Server Model
- Characteristics:
  - There are processes offering services (servers)
  - There are processes that use services (clients)
  - Clients and servers can be on different machines
  - Clients follow request/reply model wrt to using services
Multiple Clients/Single Server
- Web proxy server
- Web Applets
- Server forms bottleneck
- Server forms single point of failure
- System scaling difficult
Multiple Clients/Multiple Servers
- Web proxy server
- Web Applets

Application Layering

Traditional three-layered view
- User-interface layer contains units for an application’s user interface
- Processing layer contains the functions of an application, i.e. without specific data
- Data layer contains the data that a client wants to manipulate through the application components
This layering is found in many distributed information systems, using traditional database technology and accompanying applications.

Multitiered Architectures (1)

Single-tiered: dumb terminal/mainframe configuration
Two-tiered: client/single server configuration
Three-tiered: each layer on separate machine

Decentralized

Structured P2P: 节点按照特定的分布式数据结构进行组织，可以直接按照节点ID进行查找
Unstructured P2P: 节点随机的选择邻居，使用（有限/概率）洪泛法进行查找
Hybrid P2P: 某些节点（Superpeer）具有特定的功能

Hybrid

Client-server combined with P2P
Edge-server architectures, which are often used for Content
Delivery Networks.

分布式系统组织形式

Multitiered Architectures (1)

Single-tiered: dumb terminal/mainframe configuration
Two-tiered: client/single server configuration
Three-tiered: each layer on separate machine

客户-服务器模式和对等模式

分布式系统组织为中间件

Middleware

In many cases, distributed systems/applications are developed according to a specific architectural style. The chosen style may not be optimal in all cases
need to (dynamically) adapt the behavior of the middleware.
Interceptors
Intercept the usual flow of control when invoking a remote
object.

进程与线程

进程和线程

Process: An execution stream in the context of a process
state

Execution stream
• Stream of executing instructions
• Running piece of code
• Sequential sequence of instructions
• “thread of control”
Process state
Everything that the running code can affect or be affected by

Processes vs. Threads

A process is different than a thread
Thread: “Lightweight process” (LWP)
- An execution stream that shares an address space
- Multiple threads within a single process
Example:
- Two processes examining memory address 0xffe84264 see different values (I.e., different contents)
- Two threads examining memory address 0xffe84264 see same value (I.e., same contents)

代码迁移

Approaches to code migration
Migration and local resources
- Resource types
  - Fixed: the resource cannot be migrated, such as local hardware
  - Fastened: the resource can, in principle, be migrated but only at high cost
  - Unattached: the resource can easily be moved along with the object (e.g. a cache)
- Object-to-resource binding
  - By identifier: the object requires a specific instance of a resource (e.g. a specific database)
  - By value: the object requires the value of a resource (e.g. the set of cache entries)
  - By type: the object requires that only a type of resource is available (e.g. a color monitor)
Migration in heterogeneous systems
- Main problem
  - The target machine may not be suitable to execute the migrated code
  - The definition of process/thread/processor context is highly dependent on local hardware, operating system and runtime system
- Make use of an abstract machine that is implemented on different platforms
  - Interpreted languages, effectively having their own VM
  - Virtual VM (as discussed previously)

强迁移vs.弱迁移

Strong and weak mobility

Code segment: contains the actual code
Data segment: contains the state
Execution state: contains context of thread executing the object’s code
Move only code and data segment (and reboot execution):
- Relatively simple, especially if code is portable
- Distinguish code shipping (push) from code fetching (pull)
Move component, including execution state
- Migration: move entire object from one machine to the other
- Cloning: start a clone, and set it in the same execution state

通信

通信的类型

Transient vs. persistent communication
- Transient communication: Comm. server discards message when it cannot be delivered at the next server, or at the receiver.
- Persistent communication: A message is stored at a communication server as long as it takes to deliver it.
Asynchrounous vs. synchronous communication
- At request submission
- At request delivery
- After request processing

Client/Server

Client/Server computing is generally based on a model of
transient synchronous communication:
- Client and server have to be active at time of communication
- Client issues request and blocks until it receives reply
- Server essentially waits only for incoming requests, and subsequently processes them
Drawbacks of synchronous communication
- Client cannot do any other work while waiting for reply
- Failures have to be handled immediately: the client is waiting
- The model may simply not be appropriate (mail, news)

Messaging

Aims at high-level persistent asynchronous communication:
Processes send each other messages, which are queued
Sender need not wait for immediate reply, but can do other
things
Middleware often ensures fault tolerance

远程过程调用RPC

RPC的工作过程

Observations
- Application developers are familiar with simple procedure model
- Well-engineered procedures operate in isolation (black box)
- There is no fundamental reason not to execute procedures on separate machine
Communication between caller & callee can be hidden by using procedurecall mechanism.

参数传递

There’s more than just wrapping parameters into a message:
- Client and server machines may have different data representations (think of byte ordering)
- Wrapping a parameter means transforming a value into a sequence of bytes
- Client and server have to agree on the same encoding:
  - How are basic data values represented (integers, floats, characters)
  - How are complex data values represented (arrays, unions)
- Client and server need to properly interpret messages, transforming them into machine-dependent representations.
Some assumptions:
- Copy in/copy out semantics: while procedure is executed, nothing can be assumed about parameter values.
- All data that is to be operated on is passed by parameters. Excludes passing references to (global) data.
Full access transparency cannot be realized.
A remote reference mechanism enhances access transparency:
- Remote reference offers unified access to remote data
- Remote references can be passed as parameter in RPCs

故障处理

// TODO

客户端无法定位到服务器
- 服务器下线、服务器与客户端版本不一致
- 使用特定的符号作为RPC的返回值
- 抛出一个异常或者信号
请求信息丢失
- 发送信息时使用计时器
- 计时器到期则重传
- 多次丢失：认为服务器下线，转到"无法定位到服务器"
响应信息丢失
- 发送信息时使用计时器
- 使用序列号编码请求信息，以免导致多次运行
服务器崩溃
- 等待服务器重启，重新操作，保证RPC至少被运行一次
- 立刻放弃并报告错误，保证RPC至多被运行一次
- 什么都不做，不保证任何情况
客户端崩溃
- 使用日志记录每个RPC
- 按时间进行分片
- 为每个RPC记录一个过期时间

动态绑定

To register, the server gives the binder its name, its version number, a unique identifier (32-bits), and a handle used to locate it
The handle is system dependent (e.g., Ethernet address, IP address, an X.500 address, …)
When the client calls one of the remote procedures for the first time, say, read:
- The client stub sees that it is not yet bound to a server, so it sends a message to the binder asking to import version x of server interface
- The binder checks to see if one or more servers have already exported an interface with this name and version number.
  - If no currently running server is willing to support this interface, the read call fails
  - If a suitable server exists, the binder gives its handle and unique identifier to the client stub

Advantages of Dynamic Binding

Flexibility
Can support multiple servers that support the same interface, e.g.:
- Binder can spread the clients randomly over the servers to even load
- Binder can poll the servers periodically, automatically deregistering the servers that fail to respond, to achieve a degree of fault tolerance
- Binder can assist in authentication: For example, a server specifies a list of users that can use it; the binder will refuse to tell users not on the list about the server
The binder can verify that both client and server are using the same version of the interface

Disadvantages of Dynamic Binding

The extra overhead of exporting/importing interfaces costs time
The binder may become a bottleneck in a large distributed system

基于消息的通信

Lower-level interface to provide more flexibility
Two (abstract) primitives are used to implement these
Send
Receive
Issues:
Are primitives blocking or nonblocking (synchronous or
asynchronous)?
Are primitives reliable or unreliable (persistent or transient)?

持久性/非持久性

Transient
The sender puts the message on the net and if it cannot be
delivered to the sender or to the next communication host, it is
lost.
There can be different types depending on whether it is
asynchronous or synchronous
Persistent
The message is stored in the communication system as long as it
takes to deliver the message to the receiver

同步/异步

Synchronous
The sender is blocked until its message is stored in the local
buffer at the receiving host or delivered to the receiver.
Asynchronous
The sender continues immediately after executing a send
The message is stored in the local buffer at the sending host or
at the first communication server.

流数据

A (continuous) data stream is a connection-oriented communication facility that supports isochronous data transmission.
Some common stream characteristics
- Streams are unidirectional
- There is generally a single source, and one or more sinks
- Often, either the sink and/or source is a wrapper around hardware(e.g., camera, CD device, TV monitor)
- Simple stream: a single flow of data, e.g., audio or video
- Complex stream: multiple data flows, e.g., stereo audio or combination audio/video

同步与资源管理

同步问题

How processes cooperate and synchronize with one another in a distributed system
- In single CPU systems, critical regions, mutual exclusion, and other synchronization problems are solved using methods such as semaphores.
- These methods will not work in distributed systems because they implicitly rely on the existence of shared memory.
  - If two events occur in a distributed system, it is difficult to determine which event occurred first.
How to decide on relative ordering of events
- Does one event precede another event?
- Difficult to determine if events occur on different machines.

时钟同步机制

In a centralized system:
- Time is unambiguous: A process gets the time by issuing a system call to the kernel. If process A gets the time and latter process B gets the time. The value B gets is higher than (or possibly equal to) the value A got

逻辑时钟

How do we maintain a global view on the system’s behavior
that is consistent with the happened-before relation?
Attach a timestamp C(e) to each event e, satisfying the
following properties:
P1: If a and b are two events in the same process, and a!=b, then
we demand that C 𝑎𝑎 < 𝐶𝐶(𝑏𝑏).
P2: If a corresponds to sending a message m, and b to the
receipt of that message, then also C 𝑎𝑎 < 𝐶𝐶(𝑏𝑏).
How to attach a timestamp to an event when there’s no
global clock ⇒ maintain a consistent set of logical clocks,
one per process.

Logical vs Physical Clocks

Clock synchronization need not be absolute! (due to Lamport, 1978):
- If two processes do not interact, their clocks need not be synchronized.
- What matters is not that all processes agree on exactly what time is it, but rather, that they agree on the order in which events occur.
For algorithms where only internal consistency of clocks matters (not whether clocks are close to real time), we speak of logical clocks.
For algorithms where clocks must not only be the same, but also must not deviate from real-time, we speak of physical clocks.

Lamport算法

Each process 𝑃𝑃𝑖𝑖 maintains a local counter 𝐶𝐶𝑖𝑖 and adjusts this counter according to the following rules:
1. For any two successive events that take place within $P_i$ , $C_i$ is incremented by 1.
2. Each time a message m is sent by process $P_i$ , the message receives a timestamp $ts(m)=C_i$ .
3. Whenever a message m is received by a process $P_j$ , $P_j$ adjusts its local counter $C_j$ to $max{C_j, ts(m)}$ ; then executes step 1 before passing m to the application.
Property P1 is satisfied by (1); Property P2 by (2) and (3).
It can still occur that two events happen at the same time. Avoid this by breaking ties through process IDs.

向量时戳

分布式系统中的互斥访问

Basic solutions
Via a centralized server.
Completely decentralized, using a peer-to-peer system.
Completely distributed, with no topology imposed.
Completely distributed along a (logical) ring.

分布式系统中的选举机制

复制与一致性

复制的优势与不足

Why replicate?
- Reliability
  - Avoid single points of failure
- Performance
  - Scalability in numbers and geographic area
Why not replicate?
- Replication transparency
- Consistency issues
  - Updates are costly
  - Availability may suffer if not careful

数据一致性模型

Client-Centric Consistency

More relaxed form of consistency
only concerned with replicas being eventually consistent (eventual consistency).
In the absence of any further updates, all replicas converge to identical copies of each other ->only requires guarantees that updates will be propagated.
Easy if a user always accesses the same replica; problematic if the user accesses different replicas.
- Client-centric consistency: guarantees for a single client the consistency of access to a data store.

数据一致性协议实例

Linearizability
- Primary-Backup
- Chain Replication
Sequential consistency
- Primary-based protocols
  - Remote-Write protocols
  - Local-Write protocols
- Replicated Write protocols
  - Active replication
  - Quorum-based protocols

基于法定数量的协议

容错

可信系统(DependableSystem)特征

可靠性：Reliability
- 系统的表现符合预期设定
- 系统在给定时间内不出现任何故障的概率
- Typically used to describe systems that cannot be repaired or where the continuous operation of the system is critical.
可用性：Availability
- The fraction of the time that a system meets its specification.
- The probability that the system is operational at a given time t.
安全性：Safety
- 当系统暂时地偏离了预期设定, 也不会造成较大事故
可维护性：Maintainability
- 用于衡量系统维护是否困难

提高系统可信性的途径

通过冗余掩饰系统异常
- 冗余信息
  - 通过附加额外的比特检测和恢复传输错误
- 冗余时间
  - 当一个交易中止后, 在不会造成负面影响的情况下重新执行它
- Physical redundancy
  - 硬件冗余
    - 使用多个服务器降低服务器失效风险
  - 软件冗余
    - 通过相同的方式多次运行
- 冗余处理
  - 处理组: 所有的成员可以相互接替工作, 动态调整, 扁平化的结构

K容错系统

K 容错系统需要的备份数:
- Fail-silent faults : K+1
- Byzantine faults : 2K+1 : majority
组的结构: flat/hierarchical
Need for managing groups and group membership
- centralized: group server
- distributed: totally-ordered reliable multicast

拜占庭问题（Byzantine Problem）

Algorithm to reach agreement. They perform the following :

step 1: every general sends a message to every other general telling his strength ( true or lie)
step 2: each general collects received messages to form a vector
step 3: every general passes his vector to every other general
step 4: each general examines the ith element of each of the newly received vectors.
If any value has a majority, that value is put into the result vector

系统恢复

回退恢复
前向恢复
检查点（Check point）

Distributed System

http://blog.czccc.cc/p/7af16b6e/

作者

Cheng

发布于

2019-12-31

更新于

2022-08-06

许可协议

Distributed System

分布式系统模型

什么是分布式系统

分布式系统的目标

为什么要分布式？

分布式系统透明性和开放性的含义。

透明性

开放性

分布式操作系统、网络操作系统和基于中间件的系统。

分布式系统的类型。

分布式系统架构

分布式系统架构风格

Centralized

Decentralized

Hybrid

分布式系统组织形式

客户-服务器模式和对等模式

分布式系统组织为中间件

进程与线程

进程和线程

代码迁移

强迁移vs.弱迁移

通信

通信的类型

远程过程调用RPC

RPC的工作过程

参数传递

故障处理

动态绑定

基于消息的通信

持久性/非持久性

同步/异步

流数据

同步与资源管理

同步问题

时钟同步机制

逻辑时钟

Lamport算法

向量时戳

分布式系统中的互斥访问

分布式系统中的选举机制

复制与一致性

复制的优势与不足

数据一致性模型

数据一致性协议实例

基于法定数量的协议

容错

可信系统(DependableSystem)特征

提高系统可信性的途径

K容错系统

拜占庭问题（Byzantine Problem）

系统恢复

作者

发布于

更新于

许可协议

喜欢这篇文章？打赏一下作者吧

评论

目录