gpt4 book ai didi

linux - 两个MPI计算节点无法完成一次TCP连接,防火墙导致

转载 作者:行者123 更新时间:2023-12-04 18:56:50 25 4
gpt4 key购买 nike

我试图在两个计算节点上运行一个简单的 MPI 示例 node1node2 ,这是我刚刚在 Oracle Cloud 上创建的虚拟机。 (这是我第一次使用Oracle Cloud...) 系统是Ubuntu 20.04。我所做的包括:

  • node1node2在同一路径下具有正确的 MPI 环境(OpenMPI-4.1.0)。 $PATH$LD_LIBRARY_PATH也被设置了。我可以在单个节点上成功运行 MPI 示例。
  • node1之间的无密码登录和 node2已设置。我可以使用 ssh node1ssh node2将一个节点连接到另一个节点。
  • 在同一路径( hosts2 )下的两个节点上有一个主机文件( $HOSTFILE_PATH/hosts2 ),其中包含
  • node1  slots=1
    node2 slots=1
  • 可执行文件 ( test ) 位于同一路径 ( $EXE_PATH/test ) 下。

  • 然后我跑了 $(which mpirun) -n 2 -hostfile $HOSTFILE_PATH/hosts2 $EXEC_PATH/test ,我没有得到任何返回。所以我只能用 ctrl+c 来终止执行。几分钟后,我得到了输出:
     ------------------------------------------------------------
    A process or daemon was unable to complete a TCP connection
    to another process:
    Local host: instance-1-632783
    Remote host: instance-1
    This is usually caused by a firewall on the remote host. Please
    check that any firewall (e.g., iptables) has been disabled and
    try again.
    ------------------------------------------------------------
    问题与防火墙有关吗?我试过 sudo ufw status并得到 Status: inactive .我也试过 sudo iptables -L ,并得到:
    Chain INPUT (policy ACCEPT)
    target prot opt source destination
    ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
    ACCEPT icmp -- anywhere anywhere
    ACCEPT all -- anywhere anywhere
    ACCEPT udp -- anywhere anywhere udp spt:ntp
    ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:ssh
    REJECT all -- anywhere anywhere reject-with icmp-host-prohibited

    Chain FORWARD (policy ACCEPT)
    target prot opt source destination
    REJECT all -- anywhere anywhere reject-with icmp-host-prohibited

    Chain OUTPUT (policy ACCEPT)
    target prot opt source destination
    InstanceServices all -- anywhere link-local/16

    Chain InstanceServices (1 references)
    target prot opt source destination
    ACCEPT tcp -- anywhere 169.254.0.2 owner UID match root tcp dpt:iscsi-target /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
    ACCEPT tcp -- anywhere 169.254.2.0/24 owner UID match root tcp dpt:iscsi-target /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
    ACCEPT tcp -- anywhere 169.254.4.0/24 owner UID match root tcp dpt:iscsi-target /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
    ACCEPT tcp -- anywhere 169.254.5.0/24 owner UID match root tcp dpt:iscsi-target /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
    ACCEPT tcp -- anywhere 169.254.0.2 tcp dpt:http /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
    ACCEPT udp -- anywhere 169.254.169.254 udp dpt:domain /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
    ACCEPT tcp -- anywhere 169.254.169.254 tcp dpt:domain /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
    ACCEPT tcp -- anywhere 169.254.0.3 owner UID match root tcp dpt:http /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
    ACCEPT tcp -- anywhere 169.254.0.4 tcp dpt:http /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
    ACCEPT tcp -- anywhere 169.254.169.254 tcp dpt:http /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
    ACCEPT udp -- anywhere 169.254.169.254 udp dpt:bootps /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
    ACCEPT udp -- anywhere 169.254.169.254 udp dpt:tftp /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
    ACCEPT udp -- anywhere 169.254.169.254 udp dpt:ntp /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
    REJECT tcp -- anywhere link-local/16 tcp /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */ reject-with tcp-reset
    REJECT udp -- anywhere link-local/16 udp /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */ reject-with icmp-port-unreachable
    然后我尝试了 sudo iptables -F , 之后, sudo iptables -L显示:
    Chain INPUT (policy ACCEPT)
    target prot opt source destination

    Chain FORWARD (policy ACCEPT)
    target prot opt source destination

    Chain OUTPUT (policy ACCEPT)
    target prot opt source destination

    Chain InstanceServices (0 references)
    target prot opt source destination
    但似乎 sudo iptables -F暂时删除策略。当我重新启动系统时, sudo iptables -L显示前一个输出。那么如何解决防火墙问题呢?我应该永久删除这些政策吗?如何?

    最佳答案

    即使虚拟机位于同一子网中,您仍然必须允许它们之间的流量。
    因此,在您正在使用的子网的安全列表中打开所需的端口 (https://docs.oracle.com/en-us/iaas/Content/Network/Concepts/securitylists.htm#Security_Lists)
    如果您不知道所需的端口,您可以打开所有端口(这对于生产环境来说不是一个好的做法)。

    关于linux - 两个MPI计算节点无法完成一次TCP连接,防火墙导致,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66854275/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com