gpt4 book ai didi

distributed-computing - 如果主从系统的 Multi-Paxos 中的领导者失败了怎么办?

转载 作者:行者123 更新时间:2023-12-04 15:23:15 25 4
gpt4 key购买 nike

背景:
在 Lamport 的论文 Paxos Made Simple 的第 3 部分,名为实现状态机, Multi-Paxos 被描述。 Google 中使用 Multi-Paxos Paxos Made Live . (Multi-Paxos 用于 Apache ZooKeeper )。在 Multi-Paxos 中,可能会出现间隙:

In general, suppose a leader can get α commands ahead--that is, it can propose commands i + 1 through i + α commands after commands 1 through i are chosen. A gap of up to α - 1 commands could then arise.


现在考虑以下场景:

The whole system uses master-slave architecture. Only the master serves client commands. Master and slaves reach consensus on the sequence of commands via Multi-Paxos. The master is the leader in Multi-Paxos instances. Assume now the master and two of its slaves have the states (commands have been chosen) shown in the following figure:

Master and Slaves.

Note that, there are more than one gaps in the master state. Due to asynchrony, the two slaves lag behind. At this time, the master fails.


问题:
  1. What should the slaves do after they have detected the failure of the master (for example, by heartbeat mechanism)?

  2. In particular, how to handle with the gaps and the missing commands with respect to that of the old master?


关于 Zab 的更新:
正如@sbridges 所指出的, ZooKeeper使用 Zab而不是 Paxos。去引用,

Zab is primarily designed for primary-backup (i.e., master-slave) systems, like ZooKeeper, rather than for state machine replication.


Zab 似乎与我上面列出的问题密切相关。根据 the short overview paper of Zab , Zab 协议(protocol)包括两种模式:恢复和广播。在恢复模式下,有两个特定的保证:永远不会忘记提交的消息和放弃被跳过的消息。我对 Zab 的困惑是:
  1. In recovery mode does Zab also suffer from the gaps problem? If so, what does Zab do?

最佳答案

差距应该是没有达成一致的Paxos实例。在论文 Paxos Made Simple 中,通过提出一个保持状态不变的特殊“no-op”命令来填补空白。

如果您关心 Paxos 实例的选择值的顺序,最好使用 Zab,因为 Paxos 不保留因果顺序。 https://cwiki.apache.org/confluence/display/ZOOKEEPER/PaxosRun

缺少的命令应该是已经达成一致但没有被学习者学习的 Paxos 实例。该值是不可变的,因为它已被接受者的法定人数接受。当您运行此实例 id 的 paxos 实例时,该值将被选择并恢复为阶段 1b 中的相同值。

当 slaves/followers 检测到 Leader 失败,或者 Leader 失去了 slaves/follower 的仲裁支持时,他们应该选举一个新的 Leader。

在 zookeeper 中,follower 通过保持 FIFO 的 TCP 与 leader 通信应该没有间隙。

In recovery mode, after the leader is elected, the follower synchronize with leader first, and apply the modification on state until NEWLEADER is received.

在广播模式下,follower 在 pendingTxns 中将 PROPOSAL 排队,并以相同的顺序等待 COMMIT。如果 COMMIT 的 zxid 与 pendingTxns 的 head 的 zxid 不匹配,follower 将退出。

https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0

关于distributed-computing - 如果主从系统的 Multi-Paxos 中的领导者失败了怎么办?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19208997/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com