distributed-computing - 如果主从系统的 Multi-Paxos 中的领导者失败了怎么办？-6ren

distributed-computing - 如果主从系统的 Multi-Paxos 中的领导者失败了怎么办？

转载作者：行者123 更新时间：2023-12-04 15:23:15

背景:
在 Lamport 的论文 Paxos Made Simple 的第 3 部分，名为实现状态机, Multi-Paxos 被描述。 Google 中使用 Multi-Paxos Paxos Made Live . (Multi-Paxos 用于 Apache ZooKeeper )。在 Multi-Paxos 中，可能会出现间隙:

In general, suppose a leader can get α commands ahead--that is, it can propose commands i + 1 through i + α commands after commands 1 through i are chosen. A gap of up to α - 1 commands could then arise.

现在考虑以下场景:

The whole system uses master-slave architecture. Only the master serves client commands. Master and slaves reach consensus on the sequence of commands via Multi-Paxos. The master is the leader in Multi-Paxos instances. Assume now the master and two of its slaves have the states (commands have been chosen) shown in the following figure:

.

Note that, there are more than one gaps in the master state. Due to asynchrony, the two slaves lag behind. At this time, the master fails.

问题:

What should the slaves do after they have detected the failure of the master (for example, by heartbeat mechanism)?

In particular, how to handle with the gaps and the missing commands with respect to that of the old master?

关于 Zab 的更新:
正如@sbridges 所指出的， ZooKeeper使用 Zab而不是 Paxos。去引用，

Zab is primarily designed for primary-backup (i.e., master-slave) systems, like ZooKeeper, rather than for state machine replication.

Zab 似乎与我上面列出的问题密切相关。根据 the short overview paper of Zab , Zab 协议(protocol)包括两种模式:恢复和广播。在恢复模式下，有两个特定的保证:永远不会忘记提交的消息和放弃被跳过的消息。我对 Zab 的困惑是:

In recovery mode does Zab also suffer from the gaps problem? If so, what does Zab do?

最佳答案

差距应该是没有达成一致的Paxos实例。在论文 Paxos Made Simple 中，通过提出一个保持状态不变的特殊“no-op”命令来填补空白。

如果您关心 Paxos 实例的选择值的顺序，最好使用 Zab，因为 Paxos 不保留因果顺序。 https://cwiki.apache.org/confluence/display/ZOOKEEPER/PaxosRun

缺少的命令应该是已经达成一致但没有被学习者学习的 Paxos 实例。该值是不可变的，因为它已被接受者的法定人数接受。当您运行此实例 id 的 paxos 实例时，该值将被选择并恢复为阶段 1b 中的相同值。

当 slaves/followers 检测到 Leader 失败，或者 Leader 失去了 slaves/follower 的仲裁支持时，他们应该选举一个新的 Leader。

在 zookeeper 中，follower 通过保持 FIFO 的 TCP 与 leader 通信应该没有间隙。

In recovery mode, after the leader is elected, the follower synchronize with leader first, and apply the modification on state until NEWLEADER is received.

在广播模式下，follower 在 pendingTxns 中将 PROPOSAL 排队，并以相同的顺序等待 COMMIT。如果 COMMIT 的 zxid 与 pendingTxns 的 head 的 zxid 不匹配，follower 将退出。

https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0

关于distributed-computing - 如果主从系统的 Multi-Paxos 中的领导者失败了怎么办？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19208997/

文章推荐： intellij-idea - IntelliJ IDEA : How to import Copyright settings?

文章推荐： documentation - QTP/UFT 中的代码文档

文章推荐： spring - 如何获取 Spring 数据 jpa 中更新记录的计数？

java - 使用堆栈的数组实现查找多数(领导者)
我试图在未排序的堆栈中找到多数或领导者，但我的 tos(堆栈顶部变量)遇到了问题。下面是我的代码，其中包含主要内容。数组或堆栈的大部分是在数组中出现次数超过一半的元素 (arrSize/2)。 pub
design-patterns - 领导者/追随者与工作队列
我刚刚阅读了一篇关于 Leader/Follower Pattern 的论文。如果我理解正确，我将我的工作人员放在一个队列中，第一个工作人员接受传入请求并从队列中分离。使用正常的工作队列(例如 ra

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

distributed-computing - 如果主从系统的 Multi-Paxos 中的领导者失败了怎么办？