c - 将二维数组的分布式 block 发送到 MPI 中的根进程-6ren

c - 将二维数组的分布式 block 发送到 MPI 中的根进程

转载作者：行者123 更新时间：2023-12-05 01:29:05

我有一个分布在 MPI 进程网格中的 2D 数组(在本例中为 3 x 2 进程)。数组的值是在该数组块分发到的进程中生成的，我想在根进程中将所有这些块收集在一起以显示它们。

到目前为止，我有下面的代码。这会生成一个笛卡尔通信器，找出 MPI 过程的坐标，并根据它计算出它应该获得多少数组(因为数组不必是笛卡尔网格大小的倍数)。然后我创建了一个新的 MPI 派生数据类型，它将整个进程子数组作为一个项目发送(也就是说，每个进程的步幅、块长度和计数都不同，因为每个进程都有不同大小的数组)。但是，当我与 MPI_Gather 一起收集数据时，出现段错误。

我认为这是因为我不应该在 MPI_Gather 调用中使用相同的数据类型进行发送和接收。数据类型适合发送数据，因为它具有正确的计数、步幅和块长度，但是当它到达另一端时，它将需要一个非常不同的派生数据类型。我不确定如何计算此数据类型的参数 - 有没有人有任何想法？

另外，如果我从完全错误的角度接近这个，那么请告诉我!

#include<stdio.h>
#include<array_alloc.h>
#include<math.h>
#include<mpi.h>

int main(int argc, char ** argv)
{
    int size, rank;
    int dim_size[2];
    int periods[2];
    int A = 2;
    int B = 3;
    MPI_Comm cart_comm;
    MPI_Datatype block_type;
    int coords[2];

    float **array;
    float **whole_array;

    int n = 10;
    int rows_per_core;
    int cols_per_core;
    int i, j;

    int x_start, x_finish;
    int y_start, y_finish;

    /* Initialise MPI */
    MPI_Init(&argc, &argv);

    /* Get the rank for this process, and the number of processes */
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank == 0)
    {
        /* If we're the master process */
        whole_array = alloc_2d_float(n, n);

        /* Initialise whole array to silly values */
        for (i = 0; i < n; i++)
        {
            for (j = 0; j < n; j++)
            {
                whole_array[i][j] = 9999.99;
            }
        }

        for (j = 0; j < n; j ++)
        {
            for (i = 0; i < n; i++)
            {
                printf("%f ", whole_array[j][i]);
            }
            printf("\n");
        }
    }

    /* Create the cartesian communicator */
    dim_size[0] = B;
    dim_size[1] = A;
    periods[0] = 1;
    periods[1] = 1;

    MPI_Cart_create(MPI_COMM_WORLD, 2, dim_size, periods, 1, &cart_comm);

    /* Get our co-ordinates within that communicator */
    MPI_Cart_coords(cart_comm, rank, 2, coords);

    rows_per_core = ceil(n / (float) A);
    cols_per_core = ceil(n / (float) B);

    if (coords[0] == (B - 1))
    {
        /* We're at the far end of a row */
        cols_per_core = n - (cols_per_core * (B - 1));
    }
    if (coords[1] == (A - 1))
    {
        /* We're at the bottom of a col */
        rows_per_core = n - (rows_per_core * (A - 1));
    }

    printf("X: %d, Y: %d, RpC: %d, CpC: %d\n", coords[0], coords[1], rows_per_core, cols_per_core);

    MPI_Type_vector(rows_per_core, cols_per_core, cols_per_core + 1, MPI_FLOAT, &block_type);
    MPI_Type_commit(&block_type);

    array = alloc_2d_float(rows_per_core, cols_per_core);

    if (array == NULL)
    {
        printf("Problem with array allocation.\nExiting\n");
        return 1;
    }

    for (j = 0; j < rows_per_core; j++)
    {
        for (i = 0; i < cols_per_core; i++)
        {
            array[j][i] = (float) (i + 1);
        }
    }

    MPI_Barrier(MPI_COMM_WORLD);

    MPI_Gather(array, 1, block_type, whole_array, 1, block_type, 0, MPI_COMM_WORLD);

    /*
    if (rank == 0)
    {
        for (j = 0; j < n; j ++)
        {
            for (i = 0; i < n; i++)
            {
                printf("%f ", whole_array[j][i]);
            }
            printf("\n");
        }
    }
    */
    /* Close down the MPI environment */
    MPI_Finalize();
}

我上面使用的二维数组分配例程实现为:

float **alloc_2d_float( int ndim1, int ndim2 ) {

  float **array2 = malloc( ndim1 * sizeof( float * ) );

  int i;

  if( array2 != NULL ){

    array2[0] = malloc( ndim1 * ndim2 * sizeof( float ) );

    if( array2[ 0 ] != NULL ) {

      for( i = 1; i < ndim1; i++ )
    array2[i] = array2[0] + i * ndim2;

    }

    else {
      free( array2 );
      array2 = NULL;
    }

  }

  return array2;

}

最佳答案

这是一个棘手的问题。您走在正确的轨道上，是的，您将需要不同类型的发送和接收。

发送部分很简单——如果你发送整个子阵列 array ，那么你甚至不需要 vector 类型；您可以发送整个(rows_per_core)*(cols_per_core)从 &(array[0][0]) 开始的连续 float (或 array[0] ，如果您愿意)。

正如您所收集的那样，接收是棘手的部分。让我们从最简单的情况开始——假设所有东西均分，因此所有块都具有相同的大小。然后你可以使用非常有用的 MPI_Type_create_subarray (您总是可以将其与 vector 类型拼凑在一起，但对于高维数组，这变得乏味，因为您需要为数组的每个维度创建 1 个中间类型，除了最后一个...

此外，您可以使用同样有用的 MPI_Dims_create，而不是对分解进行硬编码。创建尽可能平方的排名分解。笔记
这不一定与 MPI_Cart_create 有任何关系，尽管您可以将它用于请求的维度。我将在这里跳过cart_create 的内容，不是因为它没有用，而是因为我想专注于gather 的内容。

所以如果每个人都有相同大小的array ，那么 root 正在从每个人那里接收相同的数据类型，并且可以使用一个非常简单的子数组类型来获取他们的数据:

MPI_Type_create_subarray(2, whole_array_size, sub_array_size, starts,
                         MPI_ORDER_C, MPI_FLOAT, &block_type);
MPI_Type_commit(&block_type);

哪里 sub_array_size[] = {rows_per_core, cols_per_core} , whole_array_size[] = {n,n} ，对于这里， starts[]={0,0} - 例如，我们只是假设一切都从头开始。
这样做的原因是我们可以使用 Gatherv 将位移显式设置到数组中:

for (int i=0; i<size; i++) {
    counts[i] = 1;   /* one block_type per rank */

    int row = (i % A);
    int col = (i / A);
    /* displacement into the whole_array */
    disps[i] = (col*cols_per_core + row*(rows_per_core)*n);
}

MPI_Gatherv(array[0], rows_per_core*cols_per_core, MPI_FLOAT,
            recvptr, counts, disps, resized_type, 0, MPI_COMM_WORLD);

所以现在每个人都将他们的数据以一个块的形式发送，并且它被接收到数组的右侧部分的类型中。为此，我调整了类型的大小，使其范围只是一个浮点数，因此可以以该单位计算位移:

MPI_Type_create_resized(block_type, 0, 1*sizeof(float), &resized_type);
MPI_Type_commit(&resized_type);

整个代码如下:

#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include<mpi.h>

float **alloc_2d_float( int ndim1, int ndim2 ) {
    float **array2 = malloc( ndim1 * sizeof( float * ) );
    int i;

    if( array2 != NULL ){
        array2[0] = malloc( ndim1 * ndim2 * sizeof( float ) );
        if( array2[ 0 ] != NULL ) {
            for( i = 1; i < ndim1; i++ )
                array2[i] = array2[0] + i * ndim2;
        }

        else {
            free( array2 );
            array2 = NULL;
        }
    }
    return array2;
}

void free_2d_float( float **array ) {
    if (array != NULL) {
        free(array[0]);
        free(array);
    }
    return;
}

void init_array2d(float **array, int ndim1, int ndim2, float data) {
    for (int i=0; i<ndim1; i++) 
        for (int j=0; j<ndim2; j++)
            array[i][j] = data;
    return;
}

void print_array2d(float **array, int ndim1, int ndim2) {
    for (int i=0; i<ndim1; i++) {
        for (int j=0; j<ndim2; j++) {
            printf("%6.2f ", array[i][j]);
        }
        printf("\n");
    }
    return;
}


int main(int argc, char ** argv)
{
    int size, rank;
    int dim_size[2];
    int periods[2];
    MPI_Datatype block_type, resized_type;

    float **array;
    float **whole_array;
    float *recvptr;

    int *counts, *disps;

    int n = 10;
    int rows_per_core;
    int cols_per_core;
    int i, j;

    int whole_array_size[2];
    int sub_array_size[2];
    int starts[2];
    int A, B;

    /* Initialise MPI */
    MPI_Init(&argc, &argv);

    /* Get the rank for this process, and the number of processes */
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (rank == 0)
    {
        /* If we're the master process */
        whole_array = alloc_2d_float(n, n);
        recvptr = &(whole_array[0][0]);

        /* Initialise whole array to silly values */
        for (i = 0; i < n; i++)
        {
            for (j = 0; j < n; j++)
            {
                whole_array[i][j] = 9999.99;
            }
        }

        print_array2d(whole_array, n, n);
        puts("\n\n");
    }

    /* Create the cartesian communicator */
    MPI_Dims_create(size, 2, dim_size);
    A = dim_size[1];
    B = dim_size[0];
    periods[0] = 1;
    periods[1] = 1;

    rows_per_core = ceil(n / (float) A);
    cols_per_core = ceil(n / (float) B);
    if (rows_per_core*A != n) {
        if (rank == 0) fprintf(stderr,"Aborting: rows %d don't divide by %d evenly\n", n, A);
        MPI_Abort(MPI_COMM_WORLD,1);
    }
    if (cols_per_core*B != n) {
        if (rank == 0) fprintf(stderr,"Aborting: cols %d don't divide by %d evenly\n", n, B);
        MPI_Abort(MPI_COMM_WORLD,2);
    }

    array = alloc_2d_float(rows_per_core, cols_per_core);
    printf("%d, RpC: %d, CpC: %d\n", rank, rows_per_core, cols_per_core);

    whole_array_size[0] = n;             
    sub_array_size  [0] = rows_per_core; 
    whole_array_size[1] = n;
    sub_array_size  [1] = cols_per_core;
    starts[0] = 0; starts[1] = 0;

    MPI_Type_create_subarray(2, whole_array_size, sub_array_size, starts, 
                             MPI_ORDER_C, MPI_FLOAT, &block_type);
    MPI_Type_commit(&block_type);
    MPI_Type_create_resized(block_type, 0, 1*sizeof(float), &resized_type);
    MPI_Type_commit(&resized_type);

    if (array == NULL)
    {
        printf("Problem with array allocation.\nExiting\n");
        MPI_Abort(MPI_COMM_WORLD,3);
    }

    init_array2d(array,rows_per_core,cols_per_core,(float)rank);

    counts = (int *)malloc(size * sizeof(int));
    disps  = (int *)malloc(size * sizeof(int));
    /* note -- we're just using MPI_COMM_WORLD rank here to
     * determine location, not the cart_comm for now... */
    for (int i=0; i<size; i++) {
        counts[i] = 1;   /* one block_type per rank */

        int row = (i % A);
        int col = (i / A);
        /* displacement into the whole_array */
        disps[i] = (col*cols_per_core + row*(rows_per_core)*n);
    }

    MPI_Gatherv(array[0], rows_per_core*cols_per_core, MPI_FLOAT, 
                recvptr, counts, disps, resized_type, 0, MPI_COMM_WORLD);

    free_2d_float(array);
    if (rank == 0) print_array2d(whole_array, n, n);
    if (rank == 0) free_2d_float(whole_array);
    MPI_Finalize();
}

小事——你在聚集前不需要屏障。事实上，你几乎从来不需要屏障，而且出于某些原因，它们是昂贵的操作，并且可以隐藏问题——我的经验法则是永远不要使用屏障，除非你确切地知道为什么需要使用屏障在这种情况下坏了。特别是在这种情况下，集体 gather例程与屏障执行完全相同的同步，所以只需使用它。

现在，转向更难的东西。如果事情没有平均分配，你有几个选择。最简单的，虽然不一定是最好的，只是填充数组，使其均匀划分，即使只是为了这个操作。

如果您可以安排它使列数不均匀划分，即使行数没有，那么您仍然可以使用gatherv 并为行的每个部分创建一个 vector 类型，并gatherv 为适当的数字来自每个处理器的行数。那会工作得很好。

如果你肯定有这样的情况，既不能指望分开，又不能填充数据发送，那么我可以看到三个子选项:

正如 susterpatt 建议的那样，做点对点。对于少量任务，这很好，但随着它变大，这将比集体操作效率低得多。

创建一个由不在外边缘的所有处理器组成的通信器，并完全使用上面的代码来收集它们的代码；然后点对点边缘任务的数据。

根本不要聚集来处理 0；使用 Distributed array type描述数组的布局，并使用MPI-IO将所有数据写入文件；完成后，如果您愿意，您可以让零进程以某种方式显示数据。

关于c - 将二维数组的分布式 block 发送到 MPI 中的根进程，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5530648/

文章推荐： javascript - textContent() 格式化文本中没有空格

文章推荐： c++ - 万无一失的配置方式#define/#ifdef

文章推荐： typescript 既不是，也不是，但不是两个属性

文章推荐： sql - 如何在关系数据库中存储连续一周的展示次数？

c++ int 数组，值为 2 维 int 数组(3d 数组)
我正在尝试创建一个包含 int[][] 项的数组即 int version0Indexes[][4] = { {1,2,3,4}, {5,6,7,8} }; int version1Indexes[
Java 数组[i]++ 与++数组[i]
我有一个整数数组: private int array[]; 如果我还有一个名为 add 的方法，那么以下有什么区别: public void add(int value) { array[va
JavaScript 数组 + 数组 = 字符串？
当您尝试在 JavaScript 中将一个数组添加到另一个数组时，它会将其转换为一个字符串。通常，当以另一种语言执行此操作时，列表会合并。 JavaScript [1, 2] + [3, 4] = "
数组
根据我正在阅读的教程，如果您想创建一个包含 5 列和 3 行的表格来表示这样的数据... 45 4 34 99 56 3 23 99 43 2 1 1 0 43 67 ...它说你可以使用下
数组
我通常使用 python 编写脚本/程序，但最近开始使用 JavaScript 进行编程，并且在使用数组时遇到了一些问题。在 python 中，当我创建一个数组并使用 for x in y 时，我得
数组 toString() 中的 javascript 数组
我有一个这样的数组: temp = [ 'data1', ['data1_a','data1_b'], ['data2_a','data2_b','data2_c'] ]; // 我想使用 toStr
php - 如何将秒表结果(数组)推送到第一个表结果(数组)
rent_property (table name) id fullName propertyName 1 A House Name1 2 B
C++ 数组 [索引] 与索引 [数组]
这个问题在这里已经有了答案: 关闭13年前。 Possible Duplicate: In C arrays why is this true? a[5] == 5[a] array[index] 和
excel - 将用户名(数组)与电子邮件(数组)匹配
使用 Excel 2013。经过多年的寻找和适应，我的第一篇文章。我正在尝试将当前 App 用户(即“John Smith”)与他的电子邮件地址“jsmith@work.com”进行匹配。使用两个
r - 3D 数组 -> 应用 -> 3D 数组
当仅在一个边距上操作时，apply 似乎不会重新组装 3D 数组。考虑: arr 1)，但对我来说仍然很奇怪，如果一个函数返回一个具有尺寸的对象，那么它们基本上会被忽略。最佳答案这是一个不太理
javascript - php 数组(数组)到 javascript
我有一个包含 GPS 坐标的 MySQL 数据库。这是我检索坐标的部分 PHP 代码； $sql = "SELECT lat, lon FROM gps_data"; $stmt=$db->query
python - 查找最后一个非零元素 3D 数组 - numpy 数组
我需要找到一种方法来执行这个操作，我有一个形状数组 [批量大小, 150, 1] 代表 batch_size 整数序列，每个序列有 150 个元素长，但在每个序列中都有很多添加的零，以使所有序列具有相
android - 如何在json中访问对象>数组>对象>数组>对象？
我必须通过 url 中的 json 获取文本。层次结构如下: 对象>数组>对象>数组>对象。我想用这段代码获取文本。但是我收到错误 :org.json.JSONException: No valu
cocoa - NSMutable NSArray 数组 - 如何避免所有这些行并使用维度或 3D 数组？
enter code here- (void)viewDidLoad { NSMutableArray *imageViewArray= [[NSMutableArray alloc] init];
java - 流式传输 2d 数组、修剪值并收集回 2d 数组
知道如何对二维字符串数组执行修剪操作，例如使用 Java 流 API 进行 3x3 并将其收集回相同维度的 3x3 数组？重点是避免使用显式的 for 循环。当前的解决方案只是简单地执行一个 fo
使用嵌套循环的 Java Union 数组 2 int 数组
已关闭。此问题需要 debugging details 。目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and the
Jquery 与 JSON 数组 - 转换为 Javascript 数组
我有来自 ASP.NET Web 服务的以下 XML 输出: 1710 1711 1712 1713
javascript - 更新嵌套数组和对象中的对象。对象-->数组-->对象-->数组--> "object"
如果我有一个对象todo作为您状态的一部分，并且该对象包含数组列表，则列表内部有对象，在这些对象内部还有另一个数组listItems。如何更新数组 listItems 中 id 为“poi098”的对
c# - 如何在一个字节中转换 bool 数组，然后再转换回 bool 数组
我想将最大长度为 8 的 bool 数组打包成一个字节，通过网络发送它，然后将其解压回 bool 数组。已经在这里尝试了一些解决方案，但没有用。我正在使用单声道。我制作了 BitArray，然后尝试
c# - 将 char 数组/字符串转换为 bool 数组
我们的数据库中有这个字段指示一周中的每一天的真/假标志，如下所示:'1111110' 我需要将此值转换为 boolean 数组。为此，我编写了以下代码: char[] freqs = weekday

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c - 将二维数组的分布式 block 发送到 MPI 中的根进程