gpt4 book ai didi

apache-spark - Spark - Master : got disassociated, 删除它

转载 作者:行者123 更新时间:2023-12-02 11:47:16 37 4
gpt4 key购买 nike

我正在部署一个具有 1 个主节点和 3 个工作节点的 Spark 集群。在部署主节点和工作节点的时刻,主节点开始向日志发送垃圾邮件,并显示以下消息;

19/07/17 12:56:51 INFO Master: I have been elected leader! New state: ALIVE
19/07/17 12:56:56 INFO Master: Registering worker 172.26.140.209:35803 with 1 cores, 2.0 GB RAM
19/07/17 12:56:57 INFO Master: 172.26.140.163:59146 got disassociated, removing it.
19/07/17 12:56:58 INFO Master: 172.26.140.132:56252 got disassociated, removing it.
19/07/17 12:56:58 INFO Master: 172.26.140.194:62135 got disassociated, removing it.
19/07/17 12:57:02 INFO Master: Registering worker 172.26.140.169:44249 with 1 cores, 2.0 GB RAM
19/07/17 12:57:02 INFO Master: 172.26.140.163:59202 got disassociated, removing it.
19/07/17 12:57:03 INFO Master: 172.26.140.132:56355 got disassociated, removing it.
19/07/17 12:57:03 INFO Master: 172.26.140.194:62157 got disassociated, removing it.
19/07/17 12:57:07 INFO Master: 172.26.140.163:59266 got disassociated, removing it.
19/07/17 12:57:08 INFO Master: 172.26.140.132:56376 got disassociated, removing it.
19/07/17 12:57:08 INFO Master: Registering worker 172.26.140.204:43921 with 1 cores, 2.0 GB RAM
19/07/17 12:57:08 INFO Master: 172.26.140.194:62203 got disassociated, removing it.
19/07/17 12:57:12 INFO Master: 172.26.140.163:59342 got disassociated, removing it.
19/07/17 12:57:13 INFO Master: 172.26.140.132:56392 got disassociated, removing it.
19/07/17 12:57:13 INFO Master: 172.26.140.194:62268 got disassociated, removing it.
19/07/17 12:57:17 INFO Master: 172.26.140.163:59417 got disassociated, removing it.
19/07/17 12:57:18 INFO Master: 172.26.140.132:56415 got disassociated, removing it.
19/07/17 12:57:18 INFO Master: 172.26.140.194:62296 got disassociated, removing it.
19/07/17 12:57:22 INFO Master: 172.26.140.163:59472 got disassociated, removing it.
19/07/17 12:57:23 INFO Master: 172.26.140.132:56483 got disassociated, removing it.
19/07/17 12:57:23 INFO Master: 172.26.140.194:62323 got disassociated, removing it.

工作节点似乎正确连接到主节点并记录以下内容;
19/07/17 12:56:56 INFO Utils: Successfully started service 'sparkWorker' on port 35803.
19/07/17 12:56:56 INFO Worker: Starting Spark worker 172.26.140.209:35803 with 1 cores, 2.0 GB RAM
19/07/17 12:56:56 INFO Worker: Running Spark version 2.4.3
19/07/17 12:56:56 INFO Worker: Spark home: /opt/spark
19/07/17 12:56:56 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
19/07/17 12:56:56 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://spark-worker-0.spark-worker-service.default.svc.cluster.local:8081
19/07/17 12:56:56 INFO Worker: Connecting to master spark-master-service.default.svc.cluster.local:7077...
19/07/17 12:56:56 INFO TransportClientFactory: Successfully created connection to spark-master-service.default.svc.cluster.local/10.0.179.236:7077 after 49 ms (0 ms spent in bootstraps)
19/07/17 12:56:56 INFO Worker: Successfully registered with master spark://172.26.140.196:7077

但是 Master 仍然每 5 秒记录三个独立节点的解除关联错误。

奇怪的是,Masters 日志中列出的 IP 地址都来自 kube-proxy 服务;
kube-system   kube-proxy-5vp9r                                     1/1     Running            0          39h     172.26.140.163   aks-agentpool-31454219-2   <none>           <none>
kube-system kube-proxy-kl695 1/1 Running 0 39h 172.26.140.132 aks-agentpool-31454219-1 <none> <none>
kube-system kube-proxy-xgjws 1/1 Running 0 39h 172.26.140.194 aks-agentpool-31454219-0 <none> <none>

我的问题有两个;

1)为什么kube-proxy节点连接到Master?或者为什么 Master 节点认为 kube-proxy 节点正在参与这个集群?

2) 我需要更改什么设置才能从我的日志文件中清除此消息。

这是我的 spark-defaults.conf 文件的内容
spark.master=spark://spark-master-service:7077
spark.submit.deploy-mode=cluster
spark.executor.cores=1
spark.driver.memory=500m
spark.executor.memory=500m
spark.eventLog.enabled=true
spark.eventLog.dir=/mnt/eventLog

我找不到任何有意义的原因为什么会发生这种情况,任何帮助将不胜感激。

最佳答案

我在 Kubernetes 中的 Spark 集群遇到了同样的问题,测试了 spark 2.4.3 和 Spark 2.4.4 以及 Kubernetes 16.0 和 13.0

这是解决方案:

这就是我首先得到我的 Spark 对象的方式

spark = SparkSession.builder.appName('Kubernetes-Spark-app').getOrCreate()

通过使用 Spark master 的集群 ip,问题得到了解决!
spark = SparkSession.builder.master('spark://10.0.106.83:7077').appName('Kubernetes-Spark-app').getOrCreate()

与此图表一起使用
helm install microsoft/spark --generate-name     

关于apache-spark - Spark - Master : got disassociated, 删除它,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57078430/

37 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com