c - MPI io 按行按进程平均读取文件(而不是按 block 大小)-6ren

c - MPI io 按行按进程平均读取文件(而不是按 block 大小)

转载作者：行者123 更新时间：2023-12-04 03:03:41

我是 MPI 新手，遇到了这个问题。我想读取一个超过 20000 行的文件的内容，然后将这些行平均分配给所有进程以进行进一步处理。文件中每一行的内容是这样的(两列数字)，

45.87   13.22
45.71   13.27
45.78   13.21
45.67   13.1
45.7    13.24
45.81   13.28
45.85   13.32

我需要在运行时将线平均划分为任意数量的进程(进程数量可以是例如；2,3,4,5,....,128)

我知道如何将文件分成一个 block ，但我需要保留每一行中的值，所以我需要逐行读取。

这是我用来完成这项工作的 MPI 代码和串行代码，但我遇到了段错误。

/* Open the file */
MPI_File_open (MPI_COMM_WORLD, "small.txt", MPI_MODE_RDONLY, MPI_INFO_NULL, &myfile);
/* Get the size of the file */
MPI_File_get_size(myfile, &filesize);
/* Calculate how many elements that is */
filesize = filesize/sizeof(char);

/* Calculate how many elements each processor gets */
bufsize = filesize/np;
/* Allocate the buffer to read to, one extra for terminating null char */
buf = (char *) malloc((bufsize+1)*sizeof(char));


/* Set the file view */
MPI_File_set_view(myfile, myid*bufsize*sizeof(char), MPI_CHAR, MPI_CHAR,"native",MPI_INFO_NULL);


Nooflines_Real = count_lines(myfile);
printf("%s contains %d lines\n", argv[1], Nooflines_Real);


int count_lines (FILE *infile) {
  char readline[80];
  int lines=0;
  while( fgets(readline,80,infile) != NULL ) lines++;
  rewind(infile);
  return(lines);
}

最佳答案

您的参数 myfile 是 MPI_File 类型的变量，而不是 FILE * 类型的变量，因此您不能将它用于诸如 fgets()、rewind() 等。这可能是您的段错误的来源。

我的建议是采用 this answer 中的方法并读取每个文件的重叠 block (考虑到您不知道一行有多长的事实)，每个任务读入它们的 block 并处理 它们的 行。如果您真的关心每个文件具有完全相同的行数(在可能的范围内)，您可以让它们相互交换数据以具有完全相同的行数。

更新:如果你真的想这样做(请注意，如果你的输入全是数字，二进制格式会容易得多)，一些读取文本文件的代码，分区和其他数字一样，然后处理每一行(比如通过对列求和)作为我上面链接的答案的直接扩展:

#include <stdio.h>
#include <mpi.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>

void readlines(MPI_File *in, const int rank, const int size, const int overlap,
               char ***lines, int *nlines) {
    MPI_Offset filesize;
    MPI_Offset localsize;
    MPI_Offset start;
    MPI_Offset end;
    char *chunk;

    /* figure out who reads what */

    MPI_File_get_size(*in, &filesize);
    localsize = filesize/size;
    start = rank * localsize;
    end   = start + localsize - 1;

    /* add overlap to the end of everyone's chunk... */
    end += overlap;

    /* except the last processor, of course */
    if (rank == size-1) end = filesize;

    localsize =  end - start + 1;

    /* allocate memory */
    chunk = malloc( (localsize + 1)*sizeof(char));

    /* everyone reads in their part */
    MPI_File_read_at_all(*in, start, chunk, localsize, MPI_CHAR, MPI_STATUS_IGNORE);
    chunk[localsize] = '\0';

    /*
     * everyone calculate what their start and end *really* are by going 
     * from the first newline after start to the first newline after the
     * overlap region starts (eg, after end - overlap + 1)
     */

    int locstart=0, locend=localsize;
    if (rank != 0) {
        while(chunk[locstart] != '\n') locstart++;
        locstart++;
    }
    if (rank != size-1) {
        locend-=overlap;
        while(chunk[locend] != '\n') locend++;
    }
    localsize = locend-locstart+1;

    /* Now let's copy our actual data over into a new array, with no overlaps */
    char *data = (char *)malloc((localsize+1)*sizeof(char));
    memcpy(data, &(chunk[locstart]), localsize);
    data[localsize] = '\0';
    free(chunk);

    /* Now we'll count the number of lines */
    *nlines = 0;
    for (int i=0; i<localsize; i++)
        if (data[i] == '\n') (*nlines)++;

    /* Now the array lines will point into the data array at the start of each line */
    /* assuming nlines > 1 */
    *lines = (char **)malloc((*nlines)*sizeof(char *));
    (*lines)[0] = strtok(data,"\n");
    for (int i=1; i<(*nlines); i++)
        (*lines)[i] = strtok(NULL, "\n");

    return;
}

void processlines(char **lines, const int nlines, const int rank) {
    for (int i=0; i<nlines; i++) {
        float a, b;
        sscanf(lines[i],"%f %f", &a, &b);
        printf("%d: <%s>: %f + %f = %f\n", rank, lines[i], a, b, a+b);
    }
}

int main(int argc, char **argv) {

    MPI_File in;
    int rank, size;
    int ierr;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    if (argc != 2) {
        if (rank == 0) fprintf(stderr, "Usage: %s infilename\n", argv[0]);
        MPI_Finalize();
        exit(1);
    }

    ierr = MPI_File_open(MPI_COMM_WORLD, argv[1], MPI_MODE_RDONLY, MPI_INFO_NULL, &in);
    if (ierr) {
        if (rank == 0) fprintf(stderr, "%s: Couldn't open file %s\n", argv[0], argv[1]);
        MPI_Finalize();
        exit(2);
    }

    const int overlap=100;
    char **lines;
    int nlines;
    readlines(&in, rank, size, overlap, &lines, &nlines);

    printf("Rank %d has %d lines\n", rank, nlines);

    processlines(lines, nlines, rank);

    free(lines[0]);
    free(lines);

    MPI_File_close(&in);

    MPI_Finalize();
    return 0;
}

然后在您提供的数据集上运行:

$ mpirun -np 2 ./textio foo2.in 
Rank 0 has 4 lines
0: <45.87   13.22>: 45.869999 + 13.220000 = 59.090000
0: <45.71   13.27>: 45.709999 + 13.270000 = 58.980000
0: <45.78   13.21>: 45.779999 + 13.210000 = 58.989998
0: <45.67   13.1>: 45.669998 + 13.100000 = 58.769997
Rank 1 has 3 lines
1: <45.7    13.24>: 45.700001 + 13.240000 = 58.940002
1: <45.81   13.28>: 45.810001 + 13.280000 = 59.090000
1: <45.85   13.32>: 45.849998 + 13.320000 = 59.169998

关于c - MPI io 按行按进程平均读取文件(而不是按 block 大小)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13327127/

文章推荐： vim - 替代 vim 中的 taglist

linux - 我的文件中有 10 行。现在我需要打印前 3 行，然后打印第 5-7 行，然后打印第 9-10 行。 LINUX 的命令是什么？
猫f1.txt阿曼维沙尔阿杰贾伊维杰拉胡尔曼尼什肖比特批评塔夫林现在输出应该符合上面给定的条件最佳答案您可以在文件读取循环中设置一个计数器并打印它，计数=0 读取行时做让我们数一数++ if
python - 查找2个文件中的公共(public)行，从文件1写入公共(public)行，从文件2写入非公共(public)行
我正在尝试查找文件 1 和文件 2 中的共同行。如果公共(public)行存在，我想写入文件 2 中的行，否则打印文件 1 中的非公共(public)行。fin1 和 fin2 是这里的文件句柄。它读
mysql - 从第一个表中选择 1 行，然后从其他表中选择 n 行，然后返回到第一个表并选择第 2 行，依此类推
我有这个 SQL 脚本: CREATE TABLE `table_1` ( `IDTable_1` int(11) NOT NULL, PRIMARY KEY (`IDTable_1`) );
sql - 哪个最快，1x 插入 512 行，4x 插入 128 行，或 512x 插入 1 行
我有 512 行要插入到数据库中。我想知道提交多个插入内容是否比提交一个大插入内容有任何优势。例如 1x 512 行插入 -- INSERT INTO mydb.mytable (id, phonen
Mysql 选择子(行，行 - 1)
如何从用户中选择user_id，SUB(row, row - 1)，其中user_id=@userid我的表用户，id 为 1、3、4、10、11、23...(不是++) --id---------u
mysql - 1M 行，1 个表，几列与 300 个表，3000 行，几列与 300 列，3000 行，1 个表？
我曾尝试四处寻找解决此问题的最佳方法，但我找不到此类问题的任何先前示例。我正在构建一个基于超本地化的互联网购物中心，该区域分为大约 3000 个区域。每个区域包含大约 300 个项目。它们是相似的项
php - 我在第 32 行、第 34 行、第 36 行、第 38 行有错误 :Notice: Undefined offset: 1 in C:\wamp\www\index. php
preg_match('|phpVersion = (.*)\n|',$wampConfFileContents,$result); $phpVersion = str_replace('"','',
正则表达式 - 如何删除前 10 行/和最后 10 行
我正在尝试创建一个正则表达式，使用“搜索并替换全部”删除 200 个 txt 文件的第一行和最后 10 行我尝试 (\s*^(\h*\S.*)){10} 删除包含的前 10 行空白，但效果不佳。最
java - 结果集返回 3 行，但我只能打印 2 行？
下面的代码从数据库中获取我需要的信息，但没有打印出所有信息。首先，我知道它从表中获取了所有正确的信息，因为我已经在 sql Developer 中尝试过查询。 public static void m
sql - 选择前 10 行，然后随机选择其中 5 行
很难说出这里问的是什么。这个问题是含糊的、模糊的、不完整的、过于宽泛的或修辞性的，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开它，visit the help center 。已关
c# - 数据库操作预计影响 1 行，但实际影响 0 行
我试图在两个表中插入记录，但出现异常。您能帮我解决这个问题吗？首先我尝试了下面的代码。 await _testRepository.InsertAsync(test); await _xyzRepo
css - 在桌面上显示 1 行，在移动设备上显示 2 行
这个基本的 bootstrap CSS 显示 1 行 4 列: Text Text Text
mysql - 从表中选择前 X 行，忽略前 Y 行
如果我想从表中检索前 10 行，我将使用以下代码: SELECT * FROM Persons LIMIT 10 我想知道的是如何检索前 10 个结果之后的 10 个结果。如果我在下面执行这段代码，
java - 为什么 [列,行] 而不是 [行,列]
今天我开始使用 JexcelApi 并遇到了这个:当您尝试从特定位置获取元素时，不是像您通常期望的那样使用sheet.getCell(row,col)，而是使用sheet.getCell(col,ro
PHP - 显示表中最后 3 行 SQL 行(不起作用)
我正在尝试在我的网站上开发一个用户个人资料系统，其中包含用户之前发布的 3 个帖子。我可以让它选择前 3 条记录，但它只会显示其中一条。我是不是因为凌晨 2 点就想编码而变得愚蠢？ query($q)
php - MySQL 组相同的标题(行)并求和金钱(行)，但保留单独的时间戳进行排序
我在互联网上寻找答案，但找不到任何答案。 (我可能问错了？)我有一个看起来像这样的表: 我一直在使用查询: SELECT title, date, SUM(money) FROM payments W
mysql - 获取 100 行，每组最多 10 行
我有以下查询，我想从数据库中获取 100 个项目，但 host_id 多次出现在 urls 表中，我想每个 host_id 从该表中最多获取 10 个唯一行。 select * from urls j
sql - 如何查询前 10 行，下一次从表中查询其他 10 行
我的数据库表中有超过 500 行具有特定日期。查询特定日期的行。 select * from msgtable where cdate='18/07/2012' 这将返回 500 行。如何逐行查询
bash - 打印 n 行，然后在大文本文件中跳过 n 行
我想使用 sed 从某一行开始打印 n 行、跳过 n 行、打印 n 行等，直到文本文件结束。例如在第 4 行声明，打印 5-9，跳过 10-14，打印 15-19 等来自文件 1 2 3 4 5 6
php - 验证密码返回 0 行，而预期返回 1 行
我目前正在执行验证过程来检查用户的旧密码，但问题是我无法理解为什么我的查询返回零行，而预期它有 1 行。另一件事是，即使我不将密码文本转换为 md5，哈希密码仍然得到正确的答案，但我不知道为什么会发生

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c - MPI io 按行按进程平均读取文件(而不是按 block 大小)