gpt4 book ai didi

linux - PBS 通信错误 : Nodes can not communicate

转载 作者:太空宇宙 更新时间:2023-11-04 10:16:56 28 4
gpt4 key购买 nike

我成功安装了 pbs 服务器,启动了服务并可以使用 pbsnodes 命令查看节点。队列在 qstat -q 命令中正确显示。在我提交测试作业后,以下内容出现在我的 sched_log、server_log 和 mom 节点 mom_log 文件中:

计划日志:

08/16/2017 14:18:48.476;64; pbs_sched.19885;Job;2.headnode;Job Run
08/16/2017 14:19:28.215;02; pbs_sched.19885;Req;headnode3;Can not open connection to mom
08/16/2017 14:19:28.215;02; pbs_sched.19885;Req;headnode4;Can not open connection to mom
08/16/2017 14:19:28.238;02; pbs_sched.19885;Req;headnode5;Can not open connection to mom
08/16/2017 14:19:28.239;02; pbs_sched.19885;Req;headnode6;Can not open connection to mom

服务器日志:

08/16/2017 14:40:37.829;01;PBS_Server.27737;Svr;PBS_Server;LOG_ERROR::tcp_connect_sockaddr, Failed when trying to open tcp connection - connect() failed [rc = -2] [addr = 192.168.89.233:15003]
08/16/2017 14:40:37.829;01;PBS_Server.27739;Svr;PBS_Server;LOG_ERROR::tcp_connect_sockaddr, Failed when trying to open tcp connection - connect() failed [rc = -2] [addr = 192.168.89.232:15003]
08/16/2017 14:40:37.829;01;PBS_Server.27793;Svr;PBS_Server;LOG_ERROR::tcp_connect_sockaddr, Failed when trying to open tcp connection - connect() failed [rc = -2] [addr = 192.168.89.235:15003]
08/16/2017 14:40:38.828;01;PBS_Server.27736;Svr;PBS_Server;LOG_ERROR::tcp_connect_sockaddr, Failed when trying to open tcp connection - connect() failed [rc = -2] [addr = 192.168.89.234:15003]

妈妈日志:

08/16/2017 18:50:36.215;01;   pbs_mom.10833;Svr;pbs_mom;LOG_ERROR::send_update_to_a_server, Status not successfully updated for 11123 MOM status update intervals
08/16/2017 18:51:22.308;01; pbs_mom.10838;Svr;pbs_mom;LOG_ERROR::send_update_to_a_server, Could not contact any of the servers to send an update
08/16/2017 18:51:22.308;01; pbs_mom.10838;Svr;pbs_mom;LOG_ERROR::send_update_to_a_server, Status not successfully updated for 11124 MOM status update intervals
08/16/2017 18:52:06.402;01; pbs_mom.10859;Svr;pbs_mom;LOG_ERROR::send_update_to_a_server, Status update successfully sent after 11124 MOM status update intervals
08/16/2017 18:53:21.555;02; pbs_mom.13039;Svr;pbs_mom;Torque Mom Version = 6.1.1.1, loglevel = 0
08/16/2017 18:58:26.182;02; pbs_mom.13039;Svr;pbs_mom;Torque Mom Version = 6.1.1.1, loglevel = 0
08/16/2017 19:03:31.815;02; pbs_mom.13039;Svr;pbs_mom;Torque Mom Version = 6.1.1.1, loglevel = 0
08/16/2017 19:08:31.407;02; pbs_mom.13039;Svr;pbs_mom;Torque Mom Version = 6.1.1.1, loglevel = 0
08/16/2017 19:13:37.039;02; pbs_mom.13039;Svr;pbs_mom;Torque Mom Version = 6.1.1.1, loglevel = 0
08/16/2017 19:18:41.670;02; pbs_mom.13039;Svr;pbs_mom;Torque Mom Version = 6.1.1.1, loglevel = 0
08/16/2017 19:23:46.455;02; pbs_mom.13039;Svr;pbs_mom;Torque Mom Version = 6.1.1.1, loglevel = 0

如何解决这个问题?是由于任何类型的身份验证失败吗?在那种情况下,我应该设置 ssh key 身份验证登录吗?

有趣的是,我有另一台带有 Torque 的服务器,名为 headnode2,ip 为 .89.231,但没有显示任何错误。我没有按照任何额外的步骤来配置那个。

最佳答案

您可能只需要配置防火墙。我会跑

# iptables-save > iptables.bak && iptables -F

在服务器和一个测试节点上,然后向该节点提交作业,看它是否运行。

关于linux - PBS 通信错误 : Nodes can not communicate,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45709837/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com