mpi + infiniband 连接过多-6ren

mpi + infiniband 连接过多

转载作者：行者123 更新时间：2023-12-02 04:26:36

39

4

我正在集群上运行 MPI 应用程序，使用 4 个节点，每个节点有 64 个核心。该应用程序执行所有对所有的通信模式。

通过以下方式执行应用程序运行良好:

$: mpirun -npernode 36 ./Application

为每个节点添加更多进程会导致应用程序崩溃:

$: mpirun -npernode 37 ./Application

--------------------------------------------------------------------------
A process failed to create a queue pair. This usually means either
the device has run out of queue pairs (too many connections) or
there are insufficient resources available to allocate a queue pair
(out of memory). The latter can happen if either 1) insufficient
memory is available, or 2) no more physical memory can be registered
with the device.

For more information on memory registration see the Open MPI FAQs at:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

Local host:             laser045
Local device:           qib0
Queue pair type:        Reliable connected (RC)
--------------------------------------------------------------------------
[laser045:15359] *** An error occurred in MPI_Issend
[laser045:15359] *** on communicator MPI_COMM_WORLD
[laser045:15359] *** MPI_ERR_OTHER: known error not in list
[laser045:15359] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[laser040:49950] [[53382,0],0]->[[53382,1],30] mca_oob_tcp_msg_send_handler: writev failed: Connection reset by peer (104) [sd = 163]
[laser040:49950] [[53382,0],0]->[[53382,1],21] mca_oob_tcp_msg_send_handler: writev failed: Connection reset by peer (104) [sd = 154]
--------------------------------------------------------------------------
mpirun has exited due to process rank 128 with PID 15358 on
node laser045 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[laser040:49950] 4 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / ibv_create_qp failed
[laser040:49950] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[laser040:49950] 4 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal

编辑添加了所有通信模式的一些源代码:

// Send data to all other ranks
for(unsigned i = 0; i < (unsigned)size; ++i){
    if((unsigned)rank == i){
        continue;
    }

    MPI_Request request;
    MPI_Issend(&data, dataSize, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &request);
    requests.push_back(request);
}

// Recv data from all other ranks
for(unsigned i = 0; i < (unsigned)size; ++i){
    if((unsigned)rank == i){
       continue;
    }

    MPI_Status status;
    MPI_Recv(&recvData, recvDataSize, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, &status);
}

// Finish communication operations
for(MPI_Request &r: requests){
    MPI_Status status;
    MPI_Wait(&r, &status);
}

作为集群用户，我可以做些什么，或者可以向集群管理员提供一些建议吗？

最佳答案

行 mca_oob_tcp_msg_send_handler 错误行可能表示与接收等级对应的节点已死亡(内存不足或收到 SIGSEGV):

http://www.open-mpi.org/faq/?category=tcp#tcp-connection-errors

Open-MPI 中的 OOB(带外)框架用于控制消息，而不是应用程序的消息。事实上，消息通常通过字节传输层 (BTL)，例如 self、sm、vader、openib (Infiniband) 等。

“ompi_info -a”的输出在这方面很有用。

最后，问题中没有指定Infiniband硬件供应商是Mellanox，因此XRC选项可能不起作用(例如Intel/QLogic Infiniband不支持此选项)。

关于mpi + infiniband 连接过多，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26576329/

39

4

0

文章推荐： ImageMagick 更改图像宽度和高度

文章推荐： Verilog:为什么我不能反转电线？

文章推荐： F# 模式匹配和函数

文章推荐： robotframework - 机器人框架中的 __init__.txt

infiniband - InfiniBand 中的门铃是什么？
我正在阅读一份关于 InfiniBand 的文档，名为《 InfiniBand™ Host Channel Adapter Verb Implementer’s Guide 》--Intel，在第6.
infiniband - infiniband (RDMA) 的最大电缆长度是多少？
infiniband (RDMA) 的最大电缆长度是多少？例如。这里已经说过，对于四 channel 铜缆可以达到 10 M 。并使用光纤连接电缆，与标准 InfiniBand 4x 和以太网 10
infiniband - 基于原生 Infiniband 的 RDMA
是否可以在不使用 IPoIB 的情况下通过 native InfiniBand 使用 RDMA(仅使用 guid 或 lit)？我查过Infiniband addressing - host nam
infiniband - Infiniband 和 RDMA 之间的区别
Linux 命令 ibstat 和 ibv_devinfo 的手册页说 ibstat - query basic status of InfiniBand device(s) ibv_devinfo
infiniband - RDMA内存共享
我有几台通过 Infiniband 网络连接的多核计算机。我想在共享内存池上进行一些低延迟计算，并进行远程原子操作。我知道 RDMA 是可行的方法。在每个节点上，我将注册一个内存区域(和保护域)以进行
InfiniBand 解释
谁能解释一下 InfiniBand 是什么？与以太网相比的主要区别是什么，这些差异如何使其比以太网更快？在官方description从 mellanox 写到 Introduce InfiniBan
azure - InfiniBand RDMA
我正在尝试在 Azure 上的 A8 计算机上使用 InfiniBand。实际上，乒乓测试工作正常，但是我无法运行基于 RDMA 的简单程序。我可以通过 ibv_get_device_list(NUL
mpi + infiniband 连接过多
我正在集群上运行 MPI 应用程序，使用 4 个节点，每个节点有 64 个核心。该应用程序执行所有对所有的通信模式。通过以下方式执行应用程序运行良好: $: mpirun -npernode 36
azure - InfiniBand RDMA
我正在尝试在 Azure 上的 A8 计算机上使用 InfiniBand。实际上，乒乓测试工作正常，但是我无法运行基于 RDMA 的简单程序。我可以通过 ibv_get_device_list(NUL
mpi + infiniband 连接过多
我正在集群上运行 MPI 应用程序，使用 4 个节点，每个节点有 64 个核心。该应用程序执行所有对所有的通信模式。通过以下方式执行应用程序运行良好: $: mpirun -npernode 36
networking - InfiniBand 网络性能
我正在使用 iperf 测量 InfiniBand 的性能。它是服务器和客户端之间的一对一连接。我测量了请求网络 I/O 的线程的带宽变化数。 (集群服务器有: “用于系统 x 的 Mellano
multithreading - InfiniBand:传输速率取决于MPI_Test *频率
我正在编写一个多线程OpenMPI应用程序，使用来自多个线程的MPI_Isend和MPI_Irecv在InfiniBand RDMA的各个列之间每秒交换数百条消息。传输量约为400-800KByte
c++ - 无法通过 infiniband 连接到服务器
我正在尝试用 C++ 制作一个小型服务器，它可以简单地回显它通过无限带宽连接接收到的任何内容。我还在 Ubuntu 下使用套接字直接协议(protocol)和 POSIX 套接字。不幸的是，我在互联
linux - infiniband rdma 传输带宽差
在我的应用程序中，我使用无限带宽基础设施将数据流从一台服务器发送到另一台服务器。我习惯于通过 infiniband 轻松开发 ip，因为我更熟悉套接字编程。到目前为止，性能(最大带宽)对我来说已经足够
apache-spark - Spark 和 InfiniBand
我正在尝试在具有无限带宽互连的 HPC 集群中使用 Spark。此集群不提供对IPoIB的支持。我在here看到俄亥俄州立大学的Spakr-RDMA项目。我找不到其他人在做这件事，或者 apache
pci - Infiniband 动词涉及哪些 PCIe 操作？
以下是一些具体细节。当进程调用ibv_post_send()时，HCA 的 PCI 接口(interface)会发生什么？ WQE是否封装在PCIe门铃内并通过Programmed IO写入？或者
cuda - 如何在 Infiniband 中使用 GPUDirect RDMA
我有两台机器。每台机器上有多张特斯拉卡。每台机器上还有一张 InfiniBand 卡。我想通过 InfiniBand 在不同机器上的 GPU 卡之间进行通信。只需点对点单播就可以了。我当然想使用 GP
azure - 如何: Azure OpenMPI with Infiniband - Linux
我刚开始使用 Microsoft Azure 进行科学计算，并且在设置时遇到了一些问题。我有一个跳线盒设置，它充当我想要使用的软件的许可证服务器，还有一个通用驱动器来存储所有软件。还设置了 6 个计
c - 以编程方式检索 infiniband 设备 ip 地址
我正在尝试以编程方式查找名称未知的 Infiniband 接口(interface)的 inet 地址先验。我在 Linux 上，我想避免解析 ifconfig (8) 输出。我已经阅读了关于 th
我可以在不使用 DMA Controller 的情况下通过 Infiniband 发送数据吗？
我可以在不使用 DMA Controller 的情况下通过 Infiniband 发送数据吗？我可以发送的最小包大小是多少？也就是说，我可以使用简单的指针直接从当前 CPU1-Core 访问远程 C

首页

博学

6Ren·AI

商城

mpi + infiniband 连接过多