gpt4 book ai didi

apache-kafka - zookeeper集群中某个节点出现故障后怎么办?

转载 作者:行者123 更新时间:2023-12-04 17:49:39 24 4
gpt4 key购买 nike

根据 https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkMulitServerSetup

Cross Machine Requirements For the ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with each other. To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines. Thus, a deployment that consists of three machines can handle one failure, and a deployment of five machines can handle two failures. Note that a deployment of six machines can only handle two failures since three machines is not a majority. For this reason, ZooKeeper deployments are usually made up of an odd number of machines.

To achieve the highest probability of tolerating a failure you should try to make machine failures independent. For example, if most of the machines share the same switch, failure of that switch could cause a correlated failure and bring down the service. The same holds true of shared power circuits, cooling systems, etc.

我的问题是:发现Zookeeper集群内节点故障后,集群2F+1重新恢复怎么办?我们需要重启所有的 zookeeper 节点吗?客户端也连接到 Zookeeper 集群,假设我们使用 DNS 名称并且恢复的节点使用相同的 DNS 名称。

例如:10.51.22.89 动物园管理员 110.51.22.126 动物园管理员 210.51.23.216 动物园管理员 3

如果 10.51.22.89 死了,我们将 10.51.22.90 作为 zookeeper1,所有节点都可以识别这个变化。

最佳答案

如果您将 10.51.22.90 连接为 zookeeper1(具有与 10.51.22.89 之前相同的 myid 文件和配置)并且数据目录为空,则该进程将连接到当前领导者(zookeeper2 或zookeeper3) 并复制数据的快照。初始化成功后,该节点将通知集群的其余节点,您再次拥有 2F+1。

自己尝试一下,在日志文件上使用 tail -f。它不会伤害集群,你会学到很多关于 zookeeper 内部的知识 ;-)

关于apache-kafka - zookeeper集群中某个节点出现故障后怎么办?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46097648/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com