Distributed System

分布式系统期末总结


分布式系统模型

什么是分布式系统

A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system.

分布式系统是一组自主计算的集合,在用户看来是一个单一的一致系统

l1ordS.png

分布式系统的目标

  • Making resources available: 使资源多处可用
  • Distribution transparency: 分布式透明性
  • Openness: 开放性
  • Scalability: 可扩展性

为什么要分布式?

Because people are distributed

l1o5ZT.png

方面 原因
经济 多个微处理器提供了比主机更好的性价比
速度 分布式系统可以拥有更多的算力
继承的分布式 一些应用本身便涉及到空间上分散的机器
可信性 如果单个机器损坏, 整个系统仍可以存活
可增加的成长 算力可以一点点地增加

分布式系统透明性和开放性的含义。

透明性

l1OB1e.png

隐藏底层的资源获取方式、资源位置、资源移动、重定位、复制、并发性、资源错误与恢复

  • 想要实现完全的透明可能比较困难:
    • 用户可能被定位在不同的组件
    • 完全隐藏网络和节点的错误是(理论与实践上)不可能的
      • 难以分辨速度较慢与网络错误
      • 难以确定服务器在崩溃之前是否已经进行了某些操作
    • 完全的透明性将会影响性能,暴露系统的分布式环境
      • 保存Web缓存与服务器同步
      • 将写操作立即写入磁盘以免导致错误

开放性

  • 能够与其他开放系统的服务进行交互,不管底层的环境如何
    • 系统应该定义良好的接口
    • 系统应该支持应用的迁移
    • 系统应该容易进行操作(interoperate)
  • 至少使分布式系统独立于底层的运行环境

分布式操作系统、网络操作系统和基于中间件的系统。

// TODO

分布式系统的类型。

  • 分布式计算系统:Distributed computing systems
    • 集群计算:Cluster computing
      • a group of high-end systems connected through a LAN
      • Homogeneous: same OS, near-identical hardware
    • 网格计算:Grid Computing
      • lots of nodes from everywhere
      • Heterogeneous
      • Dispersed across several organizations
      • Can easily span a wide-area network
    • 云计算
  • 分布式信息系统:Distributed information systems
    • 当今使用最广泛的分布式系统是各种各样的传统信息系统
    • Example: 交易处理系统.
    • 交易是在一组对象上进行的一系列操作,满足ACID:
      • Atomicity
      • Consistency
      • Isolation
      • Durability
    • 通常,交易涉及的数据会分布在不同的服务器上。TP控制器用于管理交易的执行。
  • Distributed pervasive systems
    • Emerging next-generation of distributed systems in which nodes are small, mobile, and often embedded in a larger system, characterized by the fact that the system naturally blends into the user’s environment.
    • Ubiquitous computing systems
    • Mobile computing systems
    • Sensor (and actuator) networks

分布式系统架构

分布式系统架构风格

Centralized

  • Basic Client–Server Model
    • Characteristics:
      • There are processes offering services (servers)
      • There are processes that use services (clients)
      • Clients and servers can be on different machines
      • Clients follow request/reply model wrt to using services
  • Multiple Clients/Single Server
    • Web proxy server
    • Web Applets
    • Server forms bottleneck
    • Server forms single point of failure
    • System scaling difficult
  • Multiple Clients/Multiple Servers
    • Web proxy server
    • Web Applets

Application Layering

  • Traditional three-layered view
    • User-interface layer contains units for an application’s user interface
    • Processing layer contains the functions of an application, i.e. without specific data
    • Data layer contains the data that a client wants to manipulate through the application components
  • This layering is found in many distributed information systems, using traditional database technology and accompanying applications.

Multitiered Architectures (1)

  • Single-tiered: dumb terminal/mainframe configuration
  • Two-tiered: client/single server configuration
  • Three-tiered: each layer on separate machine

Decentralized

  • Structured P2P: 节点按照特定的分布式数据结构进行组织,可以直接按照节点ID进行查找
  • Unstructured P2P: 节点随机的选择邻居,使用(有限/概率)洪泛法进行查找
  • Hybrid P2P: 某些节点(Superpeer)具有特定的功能

Hybrid

  • Client-server combined with P2P
  • Edge-server architectures, which are often used for Content
    Delivery Networks.

分布式系统组织形式

Multitiered Architectures (1)

  • Single-tiered: dumb terminal/mainframe configuration
  • Two-tiered: client/single server configuration
  • Three-tiered: each layer on separate machine

客户-服务器模式和对等模式

分布式系统组织为中间件

l1xbMF.png

Middleware

  • In many cases, distributed systems/applications are developed according to a specific architectural style. The chosen style may not be optimal in all cases
  • need to (dynamically) adapt the behavior of the middleware.
  • Interceptors
  • Intercept the usual flow of control when invoking a remote
    object.

进程与线程

进程和线程

Process: An execution stream in the context of a process
state

  • Execution stream
    • Stream of executing instructions
    • Running piece of code
    • Sequential sequence of instructions
    • “thread of control”
  • Process state
  • Everything that the running code can affect or be affected by

Processes vs. Threads

  • A process is different than a thread

  • Thread: “Lightweight process” (LWP)

    • An execution stream that shares an address space
    • Multiple threads within a single process
  • Example:

    • Two processes examining memory address 0xffe84264 see different values (I.e., different contents)
    • Two threads examining memory address 0xffe84264 see same value (I.e., same contents)

代码迁移

  • Approaches to code migration
  • Migration and local resources
    • Resource types
      • Fixed: the resource cannot be migrated, such as local hardware
      • Fastened: the resource can, in principle, be migrated but only at high cost
      • Unattached: the resource can easily be moved along with the object (e.g. a cache)
    • Object-to-resource binding
      • By identifier: the object requires a specific instance of a resource (e.g. a specific database)
      • By value: the object requires the value of a resource (e.g. the set of cache entries)
      • By type: the object requires that only a type of resource is available (e.g. a color monitor)
  • Migration in heterogeneous systems
    • Main problem
      • The target machine may not be suitable to execute the migrated code
      • The definition of process/thread/processor context is highly dependent on local hardware, operating system and runtime system
    • Make use of an abstract machine that is implemented on different platforms
      • Interpreted languages, effectively having their own VM
      • Virtual VM (as discussed previously)

强迁移vs.弱迁移

Strong and weak mobility

  • Code segment: contains the actual code

  • Data segment: contains the state

  • Execution state: contains context of thread executing the object’s code

  • Move only code and data segment (and reboot execution):

    • Relatively simple, especially if code is portable
    • Distinguish code shipping (push) from code fetching (pull)
  • Move component, including execution state

    • Migration: move entire object from one machine to the other
    • Cloning: start a clone, and set it in the same execution state

通信

通信的类型

  • Transient vs. persistent communication
    • Transient communication: Comm. server discards message when it cannot be delivered at the next server, or at the receiver.
    • Persistent communication: A message is stored at a communication server as long as it takes to deliver it.
  • Asynchrounous vs. synchronous communication
    • At request submission
    • At request delivery
    • After request processing

Client/Server

  • Client/Server computing is generally based on a model of
    transient synchronous communication:
    • Client and server have to be active at time of communication
    • Client issues request and blocks until it receives reply
    • Server essentially waits only for incoming requests, and subsequently processes them
  • Drawbacks of synchronous communication
    • Client cannot do any other work while waiting for reply
    • Failures have to be handled immediately: the client is waiting
    • The model may simply not be appropriate (mail, news)

Messaging

  • Aims at high-level persistent asynchronous communication:
  • Processes send each other messages, which are queued
  • Sender need not wait for immediate reply, but can do other
    things
  • Middleware often ensures fault tolerance

远程过程调用RPC

RPC的工作过程

  • Observations
    • Application developers are familiar with simple procedure model
    • Well-engineered procedures operate in isolation (black box)
    • There is no fundamental reason not to execute procedures on separate machine
  • Communication between caller & callee can be hidden by using procedurecall mechanism.

参数传递

  • There’s more than just wrapping parameters into a message:
    • Client and server machines may have different data representations (think of byte ordering)
    • Wrapping a parameter means transforming a value into a sequence of bytes
    • Client and server have to agree on the same encoding:
      • How are basic data values represented (integers, floats, characters)
      • How are complex data values represented (arrays, unions)
    • Client and server need to properly interpret messages, transforming them into machine-dependent representations.
  • Some assumptions:
    • Copy in/copy out semantics: while procedure is executed, nothing can be assumed about parameter values.
    • All data that is to be operated on is passed by parameters. Excludes passing references to (global) data.
  • Full access transparency cannot be realized.
  • A remote reference mechanism enhances access transparency:
    • Remote reference offers unified access to remote data
    • Remote references can be passed as parameter in RPCs

故障处理

// TODO

  • 客户端无法定位到服务器
    • 服务器下线、服务器与客户端版本不一致
    • 使用特定的符号作为RPC的返回值
    • 抛出一个异常或者信号
  • 请求信息丢失
    • 发送信息时使用计时器
    • 计时器到期则重传
    • 多次丢失:认为服务器下线,转到"无法定位到服务器"
  • 响应信息丢失
    • 发送信息时使用计时器
    • 使用序列号编码请求信息,以免导致多次运行
  • 服务器崩溃
    • 等待服务器重启,重新操作,保证RPC至少被运行一次
    • 立刻放弃并报告错误,保证RPC至多被运行一次
    • 什么都不做,不保证任何情况
  • 客户端崩溃
    • 使用日志记录每个RPC
    • 按时间进行分片
    • 为每个RPC记录一个过期时间

动态绑定

  • To register, the server gives the binder its name, its version number, a unique identifier (32-bits), and a handle used to locate it

  • The handle is system dependent (e.g., Ethernet address, IP address, an X.500 address, …)

  • When the client calls one of the remote procedures for the first time, say, read:

    • The client stub sees that it is not yet bound to a server, so it sends a message to the binder asking to import version x of server interface
    • The binder checks to see if one or more servers have already exported an interface with this name and version number.
      • If no currently running server is willing to support this interface, the read call fails
      • If a suitable server exists, the binder gives its handle and unique identifier to the client stub

Advantages of Dynamic Binding

  • Flexibility
  • Can support multiple servers that support the same interface, e.g.:
    • Binder can spread the clients randomly over the servers to even load
    • Binder can poll the servers periodically, automatically deregistering the servers that fail to respond, to achieve a degree of fault tolerance
    • Binder can assist in authentication: For example, a server specifies a list of users that can use it; the binder will refuse to tell users not on the list about the server
  • The binder can verify that both client and server are using the same version of the interface

Disadvantages of Dynamic Binding

  • The extra overhead of exporting/importing interfaces costs time
  • The binder may become a bottleneck in a large distributed system

基于消息的通信

  • Lower-level interface to provide more flexibility
  • Two (abstract) primitives are used to implement these
  • Send
  • Receive
  • Issues:
  • Are primitives blocking or nonblocking (synchronous or
    asynchronous)?
  • Are primitives reliable or unreliable (persistent or transient)?

持久性/非持久性

  • Transient
  • The sender puts the message on the net and if it cannot be
    delivered to the sender or to the next communication host, it is
    lost.
  • There can be different types depending on whether it is
    asynchronous or synchronous
  • Persistent
  • The message is stored in the communication system as long as it
    takes to deliver the message to the receiver

同步/异步

  • Synchronous
  • The sender is blocked until its message is stored in the local
    buffer at the receiving host or delivered to the receiver.
  • Asynchronous
  • The sender continues immediately after executing a send
  • The message is stored in the local buffer at the sending host or
    at the first communication server.

流数据

  • A (continuous) data stream is a connection-oriented communication facility that supports isochronous data transmission.
  • Some common stream characteristics
    • Streams are unidirectional
    • There is generally a single source, and one or more sinks
    • Often, either the sink and/or source is a wrapper around hardware(e.g., camera, CD device, TV monitor)
    • Simple stream: a single flow of data, e.g., audio or video
    • Complex stream: multiple data flows, e.g., stereo audio or combination audio/video

同步与资源管理

同步问题

  • How processes cooperate and synchronize with one another in a distributed system
    • In single CPU systems, critical regions, mutual exclusion, and other synchronization problems are solved using methods such as semaphores.
    • These methods will not work in distributed systems because they implicitly rely on the existence of shared memory.
      • If two events occur in a distributed system, it is difficult to determine which event occurred first.
  • How to decide on relative ordering of events
    • Does one event precede another event?
    • Difficult to determine if events occur on different machines.

时钟同步机制

  • In a centralized system:
    • Time is unambiguous: A process gets the time by issuing a system call to the kernel. If process A gets the time and latter process B gets the time. The value B gets is higher than (or possibly equal to) the value A got

逻辑时钟

  • How do we maintain a global view on the system’s behavior
    that is consistent with the happened-before relation?
  • Attach a timestamp C(e) to each event e, satisfying the
    following properties:
  • P1: If a and b are two events in the same process, and a!=b, then
    we demand that C 𝑎𝑎 < 𝐶𝐶(𝑏𝑏).
  • P2: If a corresponds to sending a message m, and b to the
    receipt of that message, then also C 𝑎𝑎 < 𝐶𝐶(𝑏𝑏).
  • How to attach a timestamp to an event when there’s no
    global clock ⇒ maintain a consistent set of logical clocks,
    one per process.

Logical vs Physical Clocks

  • Clock synchronization need not be absolute! (due to Lamport, 1978):
    • If two processes do not interact, their clocks need not be synchronized.
    • What matters is not that all processes agree on exactly what time is it, but rather, that they agree on the order in which events occur.
  • For algorithms where only internal consistency of clocks matters (not whether clocks are close to real time), we speak of logical clocks.
  • For algorithms where clocks must not only be the same, but also must not deviate from real-time, we speak of physical clocks.

Lamport算法

  • Each process 𝑃𝑃𝑖𝑖 maintains a local counter 𝐶𝐶𝑖𝑖 and adjusts this counter according to the following rules:
    1. For any two successive events that take place within PiP_i , CiC_i is incremented by 1.
    2. Each time a message m is sent by process PiP_i , the message receives a timestamp ts(m)=Cits(m)=C_i .
    3. Whenever a message m is received by a process PjP_j , PjP_j adjusts its local counter CjC_j to maxCj,ts(m)max{C_j, ts(m)}; then executes step 1 before passing m to the application.
  • Property P1 is satisfied by (1); Property P2 by (2) and (3).
  • It can still occur that two events happen at the same time. Avoid this by breaking ties through process IDs.

向量时戳

lJ0TFx.png

分布式系统中的互斥访问

  • Basic solutions
  • Via a centralized server.
  • Completely decentralized, using a peer-to-peer system.
  • Completely distributed, with no topology imposed.
  • Completely distributed along a (logical) ring.

分布式系统中的选举机制

复制与一致性

复制的优势与不足

  • Why replicate?
    • Reliability
      • Avoid single points of failure
    • Performance
      • Scalability in numbers and geographic area
  • Why not replicate?
    • Replication transparency
    • Consistency issues
      • Updates are costly
      • Availability may suffer if not careful

数据一致性模型

l1zoTA.png

Client-Centric Consistency

  • More relaxed form of consistency
  • only concerned with replicas being eventually consistent (eventual consistency).
  • In the absence of any further updates, all replicas converge to identical copies of each other ->only requires guarantees that updates will be propagated.
  • Easy if a user always accesses the same replica; problematic if the user accesses different replicas.
    • Client-centric consistency: guarantees for a single client the consistency of access to a data store.

数据一致性协议实例

  • Linearizability
    • Primary-Backup
    • Chain Replication
  • Sequential consistency
    • Primary-based protocols
      • Remote-Write protocols
      • Local-Write protocols
    • Replicated Write protocols
      • Active replication
      • Quorum-based protocols

基于法定数量的协议

lYHE9K.png

容错

可信系统(DependableSystem)特征

  • 可靠性:Reliability
    • 系统的表现符合预期设定
    • 系统在给定时间内不出现任何故障的概率
    • Typically used to describe systems that cannot be repaired or where the continuous operation of the system is critical.
  • 可用性:Availability
    • The fraction of the time that a system meets its specification.
    • The probability that the system is operational at a given time t.
  • 安全性:Safety
    • 当系统暂时地偏离了预期设定, 也不会造成较大事故
  • 可维护性:Maintainability
    • 用于衡量系统维护是否困难

提高系统可信性的途径

  • 通过冗余掩饰系统异常
    • 冗余信息
      • 通过附加额外的比特检测和恢复传输错误
    • 冗余时间
      • 当一个交易中止后, 在不会造成负面影响的情况下重新执行它
    • Physical redundancy
      • 硬件冗余
        • 使用多个服务器降低服务器失效风险
      • 软件冗余
        • 通过相同的方式多次运行
    • 冗余处理
      • 处理组: 所有的成员可以相互接替工作, 动态调整, 扁平化的结构

K容错系统

  • K 容错系统需要的备份数:
    • Fail-silent faults : K+1
    • Byzantine faults : 2K+1 : majority
  • 组的结构: flat/hierarchical
  • Need for managing groups and group membership
    • centralized: group server
    • distributed: totally-ordered reliable multicast

拜占庭问题(Byzantine Problem)

Algorithm to reach agreement. They perform the following :

  1. step 1: every general sends a message to every other general telling his strength ( true or lie)
  2. step 2: each general collects received messages to form a vector
  3. step 3: every general passes his vector to every other general
  4. step 4: each general examines the ith element of each of the newly received vectors.
  5. If any value has a majority, that value is put into the result vector

系统恢复

l3SDc8.png

  • 回退恢复
  • 前向恢复
  • 检查点(Check point)
作者

Cheng

发布于

2019-12-31

更新于

2022-08-06

许可协议

评论