gpt4 book ai didi

ssh - Spark worker 不会绑定(bind)到 master

转载 作者:行者123 更新时间:2023-12-04 19:46:34 32 4
gpt4 key购买 nike

启动我的 spark worker,我收到一个错误,这可能与从属机器联系主机的可能性有关。但我不确定。

6/02/12 23:47:13 INFO Utils: Successfully started service 'sparkWorker' on port 38019.
16/02/12 23:47:13 INFO Worker: Starting Spark worker 192.168.0.38:38019 with 8 cores, 26.5 GB RAM
16/02/12 23:47:13 INFO Worker: Running Spark version 1.6.0
16/02/12 23:47:13 INFO Worker: Spark home: /home/romain/spark-1.6.0-bin-hadoop2.6
16/02/12 23:47:13 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
16/02/12 23:47:13 INFO WorkerWebUI: Started WorkerWebUI at http://192.168.0.38:8081
16/02/12 23:47:13 INFO Worker: Connecting to master 192.168.0.39:7078...
16/02/12 23:47:13 WARN Worker: Failed to connect to master 192.168.0.39:7078
java.io.IOException: Failed to connect to /192.168.0.39:7078
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.0.39:7078
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more

在 master 上我看到它已经启动并正在运行:

16/02/12 23:30:30 WARN Utils: Your hostname, pl resolves to a loopback address: 127.0.1.1; using 192.168.0.39 instead (on interface eth0)
16/02/12 23:30:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/12 23:30:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/12 23:30:31 INFO SecurityManager: Changing view acls to: romain
16/02/12 23:30:31 INFO SecurityManager: Changing modify acls to: romain
16/02/12 23:30:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(romain); users with modify permissions: Set(romain)
16/02/12 23:30:31 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
16/02/12 23:30:31 INFO Utils: Successfully started service 'sparkMaster' on port 7078.
16/02/12 23:30:31 INFO Master: Starting Spark master at spark://pl:7078
16/02/12 23:30:31 INFO Master: Running Spark version 1.6.0
16/02/12 23:30:32 INFO Utils: Successfully started service 'MasterUI' on port 3094.
16/02/12 23:30:32 INFO MasterWebUI: Started MasterWebUI at http://192.168.0.39:3094
16/02/12 23:30:32 WARN Utils: Service could not bind on port 6066. Attempting port 6067.
16/02/12 23:30:32 INFO Utils: Successfully started service on port 6067.
16/02/12 23:30:32 INFO StandaloneRestServer: Started REST server for submitting applications on port 6067
16/02/12 23:30:32 INFO Master: I have been elected leader! New state: ALIVE

浏览博客和页面似乎我们可能需要一个安全网络(我确实安装了无密码 ssh key - 但对于“romain”用户:spark launch 在哪个用户下?命令行用户我猜测)。

我应该检查一下网络吗?从这个页面: Spark worker can not connect to Master我试过了:

telnet 192.168.0.39
Trying 192.168.0.39...
telnet: Unable to connect to remote host: Connection refused

但 ping 有效:

romain@wk:~/spark-1.6.0-bin-hadoop2.6$ ping 192.168.0.39
PING 192.168.0.39 (192.168.0.39) 56(84) bytes of data.
64 bytes from 192.168.0.39: icmp_seq=1 ttl=64 time=0.233 ms
64 bytes from 192.168.0.39: icmp_seq=2 ttl=64 time=0.185 ms
^C
--- 192.168.0.39 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.185/0.209/0.233/0.024 ms

我确实有无密码 ssh 连接:

$ ssh 192.168.0.39
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.19.0-49-generic x86_64)
$

应该怎样做才能使连接成为可能?

最佳答案

通过设置 SPARK_LOCAL_IP=127.0.0.1 变量,我能够得到我的 Spark worker 工作。

  1. 你可以在 ~/.bashrc 中将其定义为本地 bash ENV 变量
  2. 您可以将 $SPARK_HOME/conf/spark-env.sh.template 复制为“conf/spark-env.sh”并在那里定义它。

在集群环境中,最好是本地IP地址。因此,您将能够看到工作节点 UI。

关于ssh - Spark worker 不会绑定(bind)到 master,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35373607/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com