gpt4 book ai didi

compiler-errors - 使用nvcc执行OpenMPI代码时出错(OPAL错误)

转载 作者:行者123 更新时间:2023-12-02 10:41:29 25 4
gpt4 key购买 nike

我正在尝试在NVIDIA Jetson TX2上运行OpenMPI代码。但是我在运行mpiexec时遇到OPAL错误。

编译说明:

$ nvcc -I/home/user/.openmpi/include/ -L/home/user/.openmpi/lib/ -lmpi -std=c++11 *.cu *.cpp -o program
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).

执行错误信息:
$ mpiexec -np 4 ./program 
[user:05728] OPAL ERROR: Not initialized in file pmix2x_client.c at line 109
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[user:05728] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
[user:05729] OPAL ERROR: Not initialized in file pmix2x_client.c at line 109
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[user:05729] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[7361,1],0]
Exit code: 1
--------------------------------------------------------------------------

我按照以下说明安装了OpenMPI版本3.1.2:
$ ./configure --prefix="/home/user/.openmpi" --with-cuda
$ make; sudo make install

我还根据此 link中的指令设置了 $PATH$LD_LIBRARY_PATH变量

我能够在笔记本电脑(Intel i7)上成功执行该程序。查找错误后,我发现一些链接提示我重新安装OpenMPI。我尝试过多次(包括重新下载该库)都没有成功。

任何帮助将不胜感激!

编辑

我尝试按照注释中的要求运行以下最小代码( main.cpp):
#include <iostream>
#include "mpi.h"
#include <string>

int main(int argc, char *argv[]) {
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
std::cout << rank << '\n';
MPI_Finalize();
return 0;
}

为了进行编译,我重新运行了前面的命令,并得到了相同的错误:
$ nvcc -I/home/user/.openmpi/include/ -L/home/user/.openmpi/lib/ -lmpi -std=c++11 main.cpp -o program

但是,如果我使用 mpic++进行编译,它将能够完美运行。
$ mpic++ main.cpp -o ./program
$ mpiexec -np 4 ./program
0
1
3
2

最佳答案

这是您已安装的OpenMPI的版本吗?我的猜测是您在构建和运行之间使用了不同的MPI版本。检查which mpirun并搜索mpirun的实例。如果您使用的是Ubuntu

sudo updatedb
locate mpirun

如果您调用正确的 mpirun(用于构建的相同版本),则错误应消失。

关于compiler-errors - 使用nvcc执行OpenMPI代码时出错(OPAL错误),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53149888/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com