gpt4 book ai didi

openmpi - mpi4py irecv 导致段错误

转载 作者:行者123 更新时间:2023-12-04 10:49:19 29 4
gpt4 key购买 nike

我正在运行以下代码,该代码使用命令 rank 将数组从 0 1 发送到 mpirun -n 2 python -u test_irecv.py > output 2>&1

from mpi4py import MPI
import numpy as np

comm = MPI.COMM_WORLD
asyncr = 1
size_arr = 10000

if comm.Get_rank()==0:
arrs = np.zeros(size_arr)
if asyncr: comm.isend(arrs, dest=1).wait()
else: comm.send(arrs, dest=1)
else:
if asyncr: arrv = comm.irecv(source=0).wait()
else: arrv = comm.recv(source=0)

print('Done!', comm.Get_rank())

使用 asyncr = 0 在同步模式下运行可提供预期输出
Done! 0
Done! 1

但是,使用 asyncr = 1 在异步模式下运行会产生如下错误。
我需要知道为什么它在同步模式下运行正常,而在异步模式下运行不正常。

输出 asyncr = 1 :
Done! 0
[nia1477:420871:0:420871] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x138)
==== backtrace ====
0 0x0000000000010e90 __funlockfile() ???:0
1 0x00000000000643d1 ompi_errhandler_request_invoke() ???:0
2 0x000000000008a8b5 __pyx_f_6mpi4py_3MPI_PyMPI_wait() /tmp/eb-A2FAdY/pip-req-build-dvnprmat/src/mpi4py.MPI.c:49819
3 0x000000000008a8b5 __pyx_f_6mpi4py_3MPI_PyMPI_wait() /tmp/eb-A2FAdY/pip-req-build-dvnprmat/src/mpi4py.MPI.c:49819
4 0x000000000008a8b5 __pyx_pf_6mpi4py_3MPI_7Request_34wait() /tmp/eb-A2FAdY/pip-req-build-dvnprmat/src/mpi4py.MPI.c:83838
5 0x000000000008a8b5 __pyx_pw_6mpi4py_3MPI_7Request_35wait() /tmp/eb-A2FAdY/pip-req-build-dvnprmat/src/mpi4py.MPI.c:83813
6 0x00000000000966a3 _PyMethodDef_RawFastCallKeywords() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Objects/call.c:690
7 0x000000000009eeb9 _PyMethodDescr_FastCallKeywords() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Objects/descrobject.c:288
8 0x000000000006e611 call_function() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/ceval.c:4563
9 0x000000000006e611 _PyEval_EvalFrameDefault() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/ceval.c:3103
10 0x0000000000177644 _PyEval_EvalCodeWithName() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/ceval.c:3923
11 0x000000000017774e PyEval_EvalCodeEx() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/ceval.c:3952
12 0x000000000017777b PyEval_EvalCode() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/ceval.c:524
13 0x00000000001aab72 run_mod() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/pythonrun.c:1035
14 0x00000000001aab72 PyRun_FileExFlags() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/pythonrun.c:988
15 0x00000000001aace6 PyRun_SimpleFileExFlags() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/pythonrun.c:430
16 0x00000000001cad47 pymain_run_file() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Modules/main.c:425
17 0x00000000001cad47 pymain_run_filename() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Modules/main.c:1520
18 0x00000000001cad47 pymain_run_python() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Modules/main.c:2520
19 0x00000000001cad47 pymain_main() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Modules/main.c:2662
20 0x00000000001cb1ca _Py_UnixMain() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Modules/main.c:2697
21 0x00000000000202e0 __libc_start_main() ???:0
22 0x00000000004006ba _start() /tmp/nix-build-glibc-2.24.drv-0/glibc-2.24/csu/../sysdeps/x86_64/start.S:120
===================
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 420871 on node nia1477 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

版本如下:
  • Python:3.7.0
  • mpi4py:3.0.0
  • mpiexec --version 给出 mpiexec (OpenRTE) 3.1.2
  • mpicc -v 给出 icc version 18.0.3 (gcc version 7.3.0 compatibility)

  • 在使用 asyncr = 1 的另一个系统中使用 MPICH 运行给出以下输出。
    Done! 0
    Traceback (most recent call last):
    File "test_irecv.py", line 14, in <module>
    if asyncr: arrv = comm.irecv(source=0).wait()
    File "mpi4py/MPI/Request.pyx", line 235, in mpi4py.MPI.Request.wait
    File "mpi4py/MPI/msgpickle.pxi", line 411, in mpi4py.MPI.PyMPI_wait
    mpi4py.MPI.Exception: MPI_ERR_TRUNCATE: message truncated
    -------------------------------------------------------
    Primary job terminated normally, but 1 process returned
    a non-zero exit code.. Per user-direction, the job has been aborted.
    -------------------------------------------------------
    --------------------------------------------------------------------------
    mpirun detected that one or more processes exited with non-zero status, thus causing
    the job to be terminated. The first process to do so was:

    Process name: [[23830,1],1]
    Exit code: 1
    --------------------------------------------------------------------------
    [master:01977] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics
    [master:01977] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

    最佳答案

    显然,这是 mpi4py 中的一个已知问题如 https://bitbucket.org/mpi4py/mpi4py/issues/65/mpi_err_truncate-message-truncated-when 中所述.利桑德罗·达尔辛 说

    The implementation of irecv() for large messages requires users to pass a buffer-like object large enough to receive the pickled stream. This is not documented (as most of mpi4py), and even non-obvious and unpythonic...



    修复方法是传递足够大的预分配 bytearrayirecv .一个工作示例如下。

    from mpi4py import MPI
    import numpy as np

    comm = MPI.COMM_WORLD
    size_arr = 10000

    if comm.Get_rank()==0:
    arrs = np.zeros(size_arr)
    comm.isend(arrs, dest=1).wait()
    else:
    arrv = comm.irecv(bytearray(1<<20), source=0).wait()

    print('Done!', comm.Get_rank())

    关于openmpi - mpi4py irecv 导致段错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59559597/

    29 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com