c - 使用 MPI_Isend 时出现段错误-6ren

c - 使用 MPI_Isend 时出现段错误

转载作者：太空宇宙更新时间：2023-11-04 03:13:18

我的程序的目的是计算内导体和外导体之间的静电势，方法是将其分成网格，然后再分成网格切片。每个处理器获取一个切片并在每个切片上运行计算。我使用 MPI_Isend 和 MPI_Irecv 在处理器之间发送数据。测试代码时出现段错误:

[physnode5:81440] *** Process received signal ***
[physnode5:81440] Signal: Segmentation fault (11)
[physnode5:81440] Signal code: Address not mapped (1)
[physnode5:81440] Failing at address: 0x58
[physnode5:81440] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x2ab8069df5d0]
[physnode5:81440] [ 1] /opt/yarcc/libraries/openmpi/2.1.0/1/default/lib/libmpi.so.20(ompi_request_default_wait+0xd)[0x2ab8066495ed]
[physnode5:81440] [ 2] /opt/yarcc/libraries/openmpi/2.1.0/1/default/lib/libmpi.so.20(MPI_Wait+0x5d)[0x2ab80667a00d]
[physnode5:81440] [ 3] ./mpi_tezt.exe[0x400ffc]
[physnode5:81440] [ 4] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab806c0e3d5]
[physnode5:81440] [ 5] ./mpi_tezt.exe[0x4009b9]
[physnode5:81440] *** End of error message ***

当这段代码被执行时。请不要我已经进入一个集群。文件名为 mpi_tezt.exe(是的，我拼错了)。我已经检查了我要发送的数组是否正确分配，并且发送和接收没有发送或接收不存在的数据(即发送数组范围之外的数据。我的 MPI_Isend 和 MPI_Irecv 代码如下:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <mpi.h>

int main(int argc, char *argv[])
{
  /*MPI Specific Variables*/
  int my_size, my_rank, up, down;
  MPI_Request reqU, reqD, sreqU, sreqD;
  MPI_Status rUstatus, rDstatus, sUstatus, sDstatus;

   /*Physical Dimensions*/
  double Linner = 5.0;/*mm*/
  double Rinner = 1.0;/*mm*/
  double phi_0 = 1000.0;/*V*/

  /*Other Variables*/
  int grid_size = 100;
  int slice;
  int x,y;
  double grid_res_y = 0.2;
  double grid_res_x = 0.1;
  int xboundary, yboundary;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_size(MPI_COMM_WORLD, &my_size);

  /*Determining neighbours*/
  if (my_rank != 0) /*if statemets used to stop highest and lowest rank neighbours arent outside 0 - my_size-1 range of ranks*/
    {
      up = my_rank-1;
    }
  else
    {
      up = 0;
    }

  if(my_rank != my_size-1)
    {
      down = my_rank+1;
    }
  else
    {
      down = my_size-1;
    }

  /*cross-check: presumed my_size is a factor of gridsize else there are odd sized slices and this is not coded for*/
  if (grid_size%my_size != 0)
    {
      printf("ERROR - number of procs =  %d, this is not a factor of grid_size %d\n", my_size, grid_size);
      exit(0);
    }

  /*Set Up Distributed Data Approach*/
  slice = grid_size/my_size;

  yboundary = Linner/grid_res_y; /*y grid index of inner conductor wall*/ 
  xboundary = Rinner/grid_res_x; /*x grid and individual array index of inner conductor wall*/


  double phi[slice+2][grid_size]; /*extra 2 rows to allow for halo data*/

  for (y=0; y < slice+2; y++)
    {
      for (x=0; x < grid_size; x++)
        { 
          phi[y][x] = 0.0;
        }
    }

  if(my_rank == 0) /*Boundary Containing rank does 2 loops. One over part with inner conductor and one over part without inner conductor*/
    {
      for(y=0; y < slice+1; y++)
        {
          for(x=xboundary; x < grid_size; x++)
            {
              phi[y][x] = phi_0;
            }
        }   
    }


  if (my_rank < my_size-1)
    {
      /*send top most strip up one node to be recieved as bottom halo*/
      MPI_Isend(&phi[1][0], grid_size  , MPI_DOUBLE, down, 1, MPI_COMM_WORLD, &sreqU);  
      /*recv top halo from up one node*/
      MPI_Irecv(&phi[slice+1][0], grid_size, MPI_DOUBLE, down, 2, MPI_COMM_WORLD, &reqU);
    }

  if (my_rank > 0)
    {
      /*recv top halo from down one node*/
      MPI_Irecv(&phi[0][0], grid_size , MPI_DOUBLE, up, 2, MPI_COMM_WORLD, &reqD);
      /*send bottom most strip down one node to be recieved as top halo*/
      MPI_Isend(&phi[slice][0], grid_size , MPI_DOUBLE, up, 1, MPI_COMM_WORLD, &sreqD);
    }

  if (my_rank<my_size-1)
    {
      /*Wait for send to down one rank to complete*/
      MPI_Wait(&sreqD, &sDstatus);
      /*Wait for recieve from up one rank to complete*/
      MPI_Wait(&reqD, &rDstatus);
    }

  if (my_rank>0)
    {
      /*Wait for send to up down one rank to complete*/
      MPI_Wait(&sreqU, &sUstatus);
      /*Wait for recieve from down one rank to complete*/
      MPI_Wait(&reqU, &rUstatus);
    }


  MPI_Finalize();

  return 0;
}

我一直在 2 个处理器(等级 0 和 1)上进行测试，希望将其扩展到更多。

有什么错误可能出在哪里吗？

最佳答案

您在 first MPI_Wait(等级 0)中出错。这是下面示例代码中的第 7 步。

使用 mpirun -np 2 ./whatever:

sReqD 似乎没有设置正确。这是在排名 1 的第 5 步设置的。

但是，第 7 步由等级 0 执行，它不设置sReqD。

因此，您需要调整您的 if 语句以正确匹配哪个等级执行哪个 MPI_Wait 等。

这是带有一些调试 printf 语句的代码:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <mpi.h>

int
main(int argc, char *argv[])
{
    /* MPI Specific Variables */
    int my_size,
     my_rank,
     up,
     down;
    MPI_Request reqU,
     reqD,
     sreqU,
     sreqD;
    MPI_Status rUstatus,
     rDstatus,
     sUstatus,
     sDstatus;

    /* Physical Dimensions */
    double Linner = 5.0;                /* mm */
    double Rinner = 1.0;                /* mm */
    double phi_0 = 1000.0;

    /*V*/
        /* Other Variables */
    int grid_size = 100;
    int slice;
    int x,
     y;
    double grid_res_y = 0.2;
    double grid_res_x = 0.1;

    int xboundary,
     yboundary;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &my_size);

    /* Determining neighbours */
    /* if statemets used to stop highest and lowest rank neighbours arent
    outside 0 - my_size-1 range of ranks */
    if (my_rank != 0) {
        up = my_rank - 1;
    }
    else {
        up = 0;
    }

    if (my_rank != my_size - 1) {
        down = my_rank + 1;
    }
    else {
        down = my_size - 1;
    }

    printf("my_rank=%d my_size=%d up=%d down=%d\n",my_rank,my_size,up,down);

    /* cross-check: presumed my_size is a factor of gridsize else there are
    odd sized slices and this is not coded for */
    if (grid_size % my_size != 0) {
        printf("ERROR - number of procs =  %d, this is not a factor of grid_size %d\n", my_size, grid_size);
        exit(0);
    }

    /* Set Up Distributed Data Approach */
    slice = grid_size / my_size;

    /* y grid index of inner conductor wall */
    yboundary = Linner / grid_res_y;
    /* x grid and individual array index of inner conductor wall */
    xboundary = Rinner / grid_res_x;

    if (my_rank == 0) {
        printf("Linner=%g grid_res_y=%g yboundary=%d\n",
            Linner,grid_res_y,yboundary);
        printf("Rinner=%g grid_res_x=%g xboundary=%d\n",
            Rinner,grid_res_x,xboundary);
        printf("slice=%d grid_size=%d phi=%ld\n",
            slice,grid_size,sizeof(double) * (slice + 2) * grid_size);
    }

    /* extra 2 rows to allow for halo data */
    double phi[slice + 2][grid_size];

    for (y = 0; y < slice + 2; y++) {
        for (x = 0; x < grid_size; x++) {
            phi[y][x] = 0.0;
        }
    }

    /* Boundary Containing rank does 2 loops. One over part with inner
    conductor and one over part without inner conductor */
    if (my_rank == 0) {
        for (y = 0; y < slice + 1; y++) {
            for (x = xboundary; x < grid_size; x++) {
                phi[y][x] = phi_0;
            }
        }
    }

    if (my_rank < my_size - 1) {
        /* send top most strip up one node to be recieved as bottom halo */
        printf("1: my_rank=%d MPI_Isend\n",my_rank);
        MPI_Isend(&phi[1][0], grid_size, MPI_DOUBLE, down, 1, MPI_COMM_WORLD,
            &sreqU);

        /* recv top halo from up one node */
        printf("2: my_rank=%d MPI_Irecv\n",my_rank);
        MPI_Irecv(&phi[slice + 1][0], grid_size, MPI_DOUBLE, down, 2,
            MPI_COMM_WORLD, &reqU);

        printf("3: my_rank=%d\n",my_rank);
    }

    if (my_rank > 0) {
        /* recv top halo from down one node */
        printf("4: my_rank=%d MPI_Irecv\n",my_rank);
        MPI_Irecv(&phi[0][0], grid_size, MPI_DOUBLE, up, 2, MPI_COMM_WORLD,
            &reqD);

        /* send bottom most strip down one node to be recieved as top halo */
        printf("5: my_rank=%d MPI_Isend\n",my_rank);
        MPI_Isend(&phi[slice][0], grid_size, MPI_DOUBLE, up, 1, MPI_COMM_WORLD,
            &sreqD);

        printf("6: my_rank=%d\n",my_rank);
    }

    if (my_rank < my_size - 1) {
        /* Wait for send to down one rank to complete */
        printf("7: my_rank=%d\n",my_rank);
        MPI_Wait(&sreqD, &sDstatus);
        printf("8: my_rank=%d\n",my_rank);

        /* Wait for recieve from up one rank to complete */
        printf("9: my_rank=%d\n",my_rank);
        MPI_Wait(&reqD, &rDstatus);
        printf("10: my_rank=%d\n",my_rank);
    }

    if (my_rank > 0) {
        /* Wait for send to up down one rank to complete */
        printf("11: my_rank=%d\n",my_rank);
        MPI_Wait(&sreqU, &sUstatus);
        printf("12: my_rank=%d\n",my_rank);

        /* Wait for recieve from down one rank to complete */
        printf("12: my_rank=%d\n",my_rank);
        MPI_Wait(&reqU, &rUstatus);
        printf("13: my_rank=%d\n",my_rank);
    }

    MPI_Finalize();

    return 0;
}

这是输出。请注意，第 7 步会打印(之前第一个 MPI_Wait 用于等级 0)。但是，排名 0 永远不会到达第 8 步(printf after 该调用)

my_rank=0 my_size=2 up=0 down=1
Linner=5 grid_res_y=0.2 yboundary=25
Rinner=1 grid_res_x=0.1 xboundary=10
slice=50 grid_size=100 phi=41600
1: my_rank=0 MPI_Isend
2: my_rank=0 MPI_Irecv
3: my_rank=0
7: my_rank=0
my_rank=1 my_size=2 up=0 down=1
4: my_rank=1 MPI_Irecv
5: my_rank=1 MPI_Isend
6: my_rank=1
11: my_rank=1
[manderly:230404] *** Process received signal ***
[manderly:230403] *** Process received signal ***
[manderly:230403] Signal: Segmentation fault (11)
[manderly:230403] Signal code: Address not mapped (1)
[manderly:230403] Failing at address: 0x58
[manderly:230404] Signal: Segmentation fault (11)
[manderly:230404] Signal code: Address not mapped (1)
[manderly:230404] Failing at address: 0x58
[manderly:230403] [ 0] [manderly:230404] [ 0] /lib64/libpthread.so.0(+0x121c0)/lib64/libpthread.so.0(+0x121c0)[0x7fa5478341c0]
[0x7fa0ebe951c0]
[manderly:230404] [ 1] [manderly:230403] [ 1] /usr/lib64/openmpi/lib/libmpi.so.20(ompi_request_default_wait+0x31)[0x7fa0ec0e9a81]
[manderly:230404] [ 2] /usr/lib64/openmpi/lib/libmpi.so.20(ompi_request_default_wait+0x31)[0x7fa547a88a81]
[manderly:230403] [ 2] /usr/lib64/openmpi/lib/libmpi.so.20(PMPI_Wait+0x60)[0x7fa0ec12c350]
[manderly:230404] [ 3] ./fix2[0x400f93]
[manderly:230404] [ 4] /usr/lib64/openmpi/lib/libmpi.so.20(PMPI_Wait+0x60)[0x7fa547acb350]
[manderly:230403] [ 3] ./fix2[0x400ef7]
/lib64/libc.so.6(__libc_start_main+0xea)[0x7fa0ebaedfea]
[manderly:230404] [ 5] ./fix2[0x40081a[manderly:230403] [ 4] ]
[manderly:230404] *** End of error message ***
/lib64/libc.so.6(__libc_start_main+0xea)[0x7fa54748cfea]
[manderly:230403] [ 5] ./fix2[0x40081a]
[manderly:230403] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node manderly exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

关于c - 使用 MPI_Isend 时出现段错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54219440/

文章推荐： python - 漂亮的汤 table 没有出现

文章推荐： javascript - 如何使用云功能向 Mailchimp 事件添加新用户？

文章推荐： python - 亚马逊欧洲 MWS Python Boto 连接访问被拒绝

android - 当我们使用 SQLite 时，当我们使用 content provider 时，当我们使用 Shared preference 时
SQLite、Content provider 和 Shared Preference 之间的所有已知区别。但我想知道什么时候需要根据情况使用 SQLite 或 Content Provider 或
Backbone.js 模型验证仅在 set->save 时(不是在 fetch 时)
警告:我正在使用一个我无法完全控制的后端，所以我正在努力解决 Backbone 中的一些注意事项，这些注意事项可能在其他地方更好地解决......不幸的是，我别无选择，只能在这里处理它们! 所以，我的
jquery - 使用 “prefetch” 时 Twitter 预输入没有结果，但使用 “remote” JSON 时
我一整天都在挣扎。我的预输入搜索表达式与远程 json 数据完美配合。但是当我尝试使用相同的 json 数据作为预取数据时，建议为空。点击第一个标志后，我收到预定义消息“无法找到任何内容...”，结果
java - repaint() 时 JTextArea 不显示，但 revalidate() 时 Graphics 不更新？
我正在制作一个模拟 NHL 选秀彩票的程序，其中屏幕右侧应该有一个 JTextField，并且在左侧绘制弹跳的选秀球。我创建了一个名为 Ball 的类，它实现了 Runnable，并在我的主 Draf
java - java中将时间戳转换为特定格式(年、月、周、日、时、时、分、秒)
这个问题已经有答案了: How can I calculate a time span in Java and format the output? (18 个回答) 已关闭 9 年前。这是我的代码
设置 header 时 AJAX 请求失败，但没有设置 header 时 AJAX 请求会成功
我有一个 ASP.NET Web API 应用程序在我的本地 IIS 实例上运行。 Web 应用程序配置有 CORS。我调用的 Web API 方法类似于: [POST("/API/{foo}/{ba
android - 用户输入年、月、日、时、分与系统年、月、日、时、分的区别
我将用户输入的时间和日期作为: DatePicker dp = (DatePicker) findViewById(R.id.datePicker); TimePicker tp = (TimePic
algorithm - 在处理 Tabu Search Optimization 时，当所有相邻解决方案都是 tabu 时，通常的做法是什么？
放宽“邻居”的标准是否足够，或者是否有其他标准行动可以采取？最佳答案如果所有相邻解决方案都是 Tabu，则听起来您的 Tabu 列表的大小太长或您的释放策略太严格。一个好的 Tabu 列表长度是
c++ - 为什么我需要传递一个比较器来构造一个 priority_queue，当它是 lambda 时，而不是当它是 std::greater 时？
我正在阅读来自 cppreference 的代码示例: #include #include #include #include template void print_queue(T& q)
javascript - 当触发器为 'click' 时，Bootstrap 3 工具提示表现得很奇怪，当触发器为 'manual' 时，则不起作用
我快疯了，我试图理解工具提示的行为，但没有成功。 1. 第一个问题是当我尝试通过插件(按钮 1)在点击事件中使用它时 -> 如果您转到 Fiddle，您会在“内容”内看到该函数' 每次点击都会调用该属
javascript - 使用 useContext 时，数据首先加载为空数组，当我应用 .map() 或 .find() 时，我收到一条错误消息
我在功能组件中有以下代码: const [ folder, setFolder ] = useState([]); const folderData = useContext(FolderContex
swift - 使用 NSURLSession 时 GET 成功，但使用 AFHTTPSessionManager 时 GET 失败
我在使用预签名网址和 AFNetworking 3.0 从 S3 获取图像时遇到问题。我可以使用 NSMutableURLRequest 和 NSURLSession 获取图像，但是当我使用 AFHT
java - 当池生命周期为 LIFE_CYCLE_FAILED 时，使用 UCP 管理器调用 closeConnections() 时 UCP 连接是否关闭？
我正在使用 Oracle ojdbc 12 和 Java 8 处理 Oracle UCP 管理器的问题。当 UCP 池启动失败时，我希望关闭它创建的连接。当池初始化期间遇到 ORA-02391:超过
ios - 当我点击 "Run"时，应用程序崩溃，但是当我点击 "Stop"然后 "Run"时，应用程序崩溃
关闭。此题需要details or clarity 。目前不接受答案。想要改进这个问题吗？通过 editing this post 添加详细信息并澄清问题. 已关闭 9 年前。 Improve
css - 我有一个笨蛋。当我在全局范围内定义我的 css 时，它起作用了。当我在我的组件中定义我的 css 时，它失败了。这是怎么回事？
引用这个plunker: https://plnkr.co/edit/GWsbdDWVvBYNMqyxzlLY?p=preview 我在 styles.css 文件和 src/app.ts 文件中指定
python - 当宽度 <1.0 时，Matplotlib 周线太细；当宽度>=1.0 时，周线太粗
为什么我的条形这么细？我尝试将宽度设置为 1，它们变得非常厚。我不知道还能尝试什么。默认厚度为 0.8，这是应该的样子吗？ import matplotlib.pyplot as plt import
当我使用 RIGHT JOIN 时，MYSQL 无法识别字段，但当我使用 NATURAL JOIN 时，MYSQL 可以识别字段
当我编写时，查询按预期执行: SELECT id, day2.count - day1.count AS diff FROM day1 NATURAL JOIN day2; 但我真正想要的是右连接。当
python - 在 pandas 中读取时间值(时、分、秒、日、月、年)时，如何指定先到先得？
我有以下时间数据: 0 08/01/16 13:07:46,335437 1 18/02/16 08:40:40,565575 2 14/01/16 22:2
javascript - 当我使用 axios POST 时，Req.body 为空，但当我使用 'request' 时，它工作正常
一些背景知识 -我的 NodeJS 服务器在端口 3001 上运行，我的 React 应用程序在端口 3000 上运行。我在 React 应用程序 package.json 中设置了一个代理来代理对端
javascript - 使用 AngularJs 时，当 img 标签具有 src attr 时，如何在其上设置 data-src
我面临着一个愚蠢的问题。我试图在我的 Angular 应用程序中延迟加载我的图像，我已经尝试过这个2: 但是他们都设置了 src attr 而不是 data-src，我在这里遗漏了什么吗？保留 d

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c - 使用 MPI_Isend 时出现段错误