gpt4 book ai didi

hadoop - 如何选择zookeeper和regionserver

转载 作者:可可西里 更新时间:2023-11-01 16:33:29 25 4
gpt4 key购买 nike

设置 regionserver 和 zookeeper quorum 的最佳做法是什么?

我有一个包含 16 个节点的小型 hadoop 集群。按照 http://hbase.apache.org/book/example_config.html 中给出的示例我选择 16 个节点作为区域服务器,并选择这些节点的一个子集作为 zookeeper。

但是当一个作业由不在与 hbase.zookeeper.quorum 对应的列表中的节点启动时,我收到以下错误:

13/08/23 15:40:05 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) 13/08/23 15:40:05 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 13/08/23 15:40:05 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 13/08/23 15:40:05 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 13/08/23 15:40:05 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid 13/08/23 15:40:05 INFO util.RetryCounter: Sleeping 2000ms before retry #1...

所以它会尝试连接 600 秒然后返回

Task attempt_xxx failed to report status for 60 seconds. Killing!

尝试几次后,它会更改节点,如果碰巧新节点属于 zookeeper 列表,则作业会成功完成。

这正常吗?

我最终将所有节点添加到 zookeeper 列表中,但我想知道这是否是一个好的做法。在任何情况下区域服务器列表应该与节点列表不同吗?

谢谢

最佳答案

不,看起来你正在做的事情不是一个好的做法。对于 16 个 RS 集群,1 个 ZK 节点应该就可以了。

查看 ZK Admin guide :

For the ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with each other. To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines. Thus, a deployment that consists of three machines can handle one failure, and a deployment of five machines can handle two failures. Note that a deployment of six machines can only handle two failures since three machines is not a majority. For this reason, ZooKeeper deployments are usually made up of an odd number of machines.

虽然那里没有说明,但 ZK 集群应该不超过 7 个节点。给定奇数节点的建议,留下 1、3、5 和 7 的选项。同样对于像您这样的小型集群,1 应该足够,但 3 会给您弹性。 5 可能有点矫枉过正。 7 绝对是。

此外,查看您粘贴的错误:

java.net.ConnectException: Connection refused

这似乎表明:

  • Hadoop 配置错误:您指向了错误的服务器/端口,或者该服务当前未在运行,或者更有可能 -
  • 网络错误配置,例如运行 iptables 之类的防火墙

关于hadoop - 如何选择zookeeper和regionserver,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18404855/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com