c++ - MPI_Irecv 没有收到所有发送？-6ren

c++ - MPI_Irecv 没有收到所有发送？

转载作者：太空狗更新时间：2023-10-29 21:47:21

我试图在这个简化的代码中实现的是:

2 种类型的进程(根进程和子进程，ids/rank 分别为 10 和 0-9)
初始化:
- root 会听 child “完成”
- children 将在所有完成后收听 root 通知
虽然没有获胜者(尚未全部完成):
- children 将有 20% 的机会完成(并通知 root 他们完成了)
- root 将检查是否所有都已完成
  - 如果全部完成:向“获胜者”的 child 发送通知

我有这样的代码:

int numprocs, id, arr[10], winner = -1;
bool stop = false;
MPI_Request reqs[10], winnerNotification;

MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &id);

for (int half = 0; half < 1; half++) {
    for (int round = 0; round < 1; round++) {
        if (id == 10) { // root
            // keeps track of who has "completed"
            fill_n(arr, 10, -1);
            for (int i = 0; i < 10; i++) {
                MPI_Irecv(&arr[i], 1, MPI_INT, i, 0, MPI_COMM_WORLD, &reqs[i]);
            }
        } else if (id < 10) { // children
            // listen to root of winner notification/indication to stop
            MPI_Irecv(&winner, 1, MPI_INT, 10, 1, MPI_COMM_WORLD, &winnerNotification);
        }

        while (winner == -1) {
            //cout << id << " is in loop" << endl;

            if (id < 10 && !stop && ((rand() % 10) + 1) < 3) { 
                // children has 20% chance to stop (finish work)
                MPI_Send(&id, 1, MPI_INT, 10, 0, MPI_COMM_WORLD);
                cout << id << " sending to root" << endl;
                stop = true;
            } else if (id == 10) {
                // root checks number of children completed
                int numDone = 0;
                for (int i = 0; i < 10; i++) {
                    if (arr[i] >= 0) {
                        //cout << "root knows that " << i << " has completed" << endl;
                        numDone++;
                    }
                }
                cout << "numDone = " << numDone << endl;

                // if all done, send notification to players to stop
                if (numDone == 10) {
                    winner = 1;
                    for (int i = 0; i < 10; i++) {
                        MPI_Send(&winner, 1, MPI_INT, i, 1, MPI_COMM_WORLD);
                    }
                    cout << "root sent notification of winner" << endl;
                }
            }
        }
    }
}

MPI_Finalize();

调试 cout 的输出看起来像:问题似乎是 root 没有收到所有 child 的完成通知？

2 sending to root
3 sending to root
0 sending to root
4 sending to root
1 sending to root
8 sending to root
9 sending to root
numDone = 1
numDone = 1
... // many numDone = 1, but why 1 only?
7 sending to root
...

我想也许我不能接收到一个数组:但我试过了

if (id == 1) {
    int x = 60;
    MPI_Send(&x, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
} else if (id == 0) {
    MPI_Recv(&arr[1], 1, MPI_INT, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
    cout << id << " recieved " << arr[1] << endl;
}

哪个有效。

更新

如果我在 while 循环结束前添加一个 MPI_Barrier(MPI_COMM_WORLD) 似乎可以解决这个问题，但为什么呢？即使进程不同步，最终，子进程也会将他们已完成的消息发送给 root，而 root 应该“监听”并进行相应处理吗？似乎正在发生的事情是 root 一直在运行，占用所有资源供 children 执行？或者这里发生了什么？

更新 2:一些 child 没有收到来自 root 的通知

好的，现在问题是 root 没有收到 child 的通知，他们已经完成了 @MichaelSh 的回答，我关注的是 child 没有收到 parent 的通知。这是重现该问题的代码:

int numprocs, id, arr[10], winner = -1;
bool stop = false;
MPI_Request reqs[10], winnerNotification;

MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &id);

srand(time(NULL) + id);

if (id < 10) {
    MPI_Irecv(&winner, 1, MPI_INT, 10, 0, MPI_COMM_WORLD, &winnerNotification);
}
MPI_Barrier(MPI_COMM_WORLD);

while (winner == -1) {
    cout << id << " is in loop ..." << endl;
    if (id == 10) {
        if (((rand() % 10) + 1) < 2) {
            winner = 2;
            for (int i = 0; i < 10; i++) {
                MPI_Send(&winner, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
            }
            cout << "winner notifications sent" << endl;
        }
    }
}

cout << id << " b4 MPI_Finalize. winner is " << winner << endl;

MPI_Finalize();

输出看起来像:

# 1 run
winner notifications sent
10 b4 MPI_Finalize. winner is 2
9 b4 MPI_Finalize. winner is 2
0 b4 MPI_Finalize. winner is 2

# another run
winner notifications sent
10 b4 MPI_Finalize. winner is 2
8 b4 MPI_Finalize. winner is 2

注意到一些进程似乎没有从父进程那里得到通知？为什么，子进程的 MPI_Wait 只会挂起它们？那么我该如何解决呢？

还有

All MPI_Barrier does in your case -- it waits for child responses to complete. Please check my answer for a better solution

如果我不这样做，我想每个 child 的 react 只需要几毫秒？所以即使我不等待/障碍，我希望接收仍然会在发送后不久发生吗？除非进程最终占用资源并且其他进程不运行？

最佳答案

请尝试此代码块(为简单起见省略了错误检查):

...
// root checks number of children completed
int numDone = 0;
MPI_Status statuses[10];
MPI_Waitall(10, reqs, statuses);
for (int i = 0; i < 10; i++) {
...

编辑更好的解决方案:
每个子节点发起root winner notification receipt并将其通知发送给root。
Root 向数组发起获胜者通知接收并进入等待接收所有通知，然后将获胜者的 id 发送给 child 。在 for (int round = 0; round < 1; round++) 之后插入此代码

            if (id == 10) 
            { // root
                // keeps track of who has "completed"
                memset(arr, -1, sizeof(arr));
                for (int i = 0; i < 10; i++) 
                {
                    MPI_Irecv(&arr[i], 1, MPI_INT, i, 0, MPI_COMM_WORLD, &reqs[i]);
                }
            } 
            else if (id < 10) 
            { // children
                // listen to root of winner notification/indication to stop
                MPI_Irecv(&winner, 1, MPI_INT, 10, 1, MPI_COMM_WORLD, &winnerNotification);
            }

            if (id < 10)
            {
                while(((rand() % 10) + 1) < 3) ;

                // children has 20% chance to stop (finish work)
                MPI_Send(&id, 1, MPI_INT, 10, 0, MPI_COMM_WORLD);
                std::cout << id << " sending to root" << std::endl;
                // receive winner notification
                MPI_Status status;
                MPI_Wait(&winnerNotification, &status);
                // Process winner notification
            } 
            else if (id == 10) 
            {
                MPI_Status statuses[10];
                MPI_Waitall(10, reqs, statuses);                    

                // if all done, send notification to players to stop
                {
                    winner = 1;
                    for (int i = 0; i < 10; i++) 
                    {
                        MPI_Send(&winner, 1, MPI_INT, i, 1, MPI_COMM_WORLD);
                    }
                    std::cout << "root sent notification of winner" << std::endl;
                }
            }

关于c++ - MPI_Irecv 没有收到所有发送？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13426771/

文章推荐： c# - 实体应该实现接口(interface)吗？

c# - 收到 TypeInitializationException
我有一个静态类。 static class AppDirectory { public static string PACSTEMP = Path.Combine(Path.GetTempPa
收到 iOS 推送通知但没有消息出现
我已经设置了一个启用了推送通知的 iOS 应用。我可以将消息推送到应用程序，例如角标(Badge)计数工作并相应更新。但我从未在锁屏或其他地方看到标准的推送通知弹出窗口，但手机会振动，因此消息会通
c# - 收到 IIS 重置通知？
我们有一个带有 Web 应用程序和一堆 Windows 服务的系统，它们在做一些后台工作。每当我们需要对系统进行更实质性的更改时，我们最终不得不发出 IIS 重置，然后手动重新启动所有相关的 Win
python - 收到 B 的正则表达式名称
我有以下几行 John SMith: A Pedro Smith: B Jonathan B: A John B: B Luis Diaz: A Scarlet Diaz: B 我需要获得所有获得
java - 收到 HANDSHAKE_FAILURE 警报
我正在编写一个 Java 客户端(在 weblogic 10.3 上)来调用一个安全的网络服务。我已获得安装在 cacerts、DemoIdentity.jks 和 DemoTrust,jks 中的客
javascript - 收到 AJAX 响应时引发事件
已关闭。此问题不符合Stack Overflow guidelines 。目前不接受答案。这个问题似乎偏离主题，因为它缺乏足够的信息来诊断问题。更详细地描述您的问题或 include a mini
java - 收到预期错误并且不知道为什么
我正在尝试调用void方法addToList，该方法将通过用户传递给它的两个字符串除外。我检查了dataSource类，以确保它确实接受了那些作为参数。问题是我在该方法调用上始终收到标识符>预期错误，
java - 收到 StringIndexOutOfBoundsException 但无法找到源
我的任务:使用scanner方法从一行数据中提取字符串、 float 和整数。数据格式为: Random String, 240.5 51603 Another String, 41.6 59087
Java - 尝试在屏幕上生成单元格，收到 ArrayIndexOutOfBoundsException
这个问题已经有答案了: What causes a java.lang.ArrayIndexOutOfBoundsException and how do I prevent it? (25 个回答)
java - 收到 NullPointerException，不知道为什么
首先我实例化一个游戏状态 class GameState extends state{ ArrayList levels; int currentLevelID; public GameState()
java - 收到 Java 无法访问代码
已关闭。这个问题是 not reproducible or was caused by typos 。目前不接受答案。这个问题是由拼写错误或无法再重现的问题引起的。虽然类似的问题可能是 on-top
ios - 收到 NSNotification 的速度有多快？
我有一个实现为单例的 Controller 对象，它有一个可以随时驱逐对象的缓存。当一个对象即将被删除时，我想通知任何使用此 Controller 的类，以便它们能够做出适当的响应。我对这种行为的第一
java - JGroups 收到 ClassNotFoundException
因此，我尝试跨集群发送消息，该消息将包含一个 User 对象，该对象是一个可序列化类。当我发送 String 或 int 时，它工作正常，消息发送没有问题，并且集群上的所有 channel 都收到它
java - 收到 StackOverFlowError 且不确定原因？
我试图创建的程序是一个基本游戏，用户输入网格大小，选择 block 接收增加分数的奖品、从分数中夺走分数的强盗或结束游戏的炸弹。我收到堆栈流错误，但我不明白为什么？抱歉，代码量很大，我只是无法找到问
java - 收到 ConcurrentModificationException 但我没有删除
使用此代码我会得到什么ConcurrentModificationException？我有一个同步(监听器)锁。 private void notifyListeners(MediumRenditio
python - 收到 DeadlineExceededError 后我还有多长时间？
我想在捕获 DeadlineExceededError 后正确退出。我还剩下多少钱来清理？例如， try: do_some_work() except DeadlineExceededError
.net - 收到 500 内部服务器错误
我有 2 个 Intranet 站点: http://intranetv1/ http://intranetv2/ v1基于.NET 1.1，v2基于.NET 3.5 在 v1 上，我创建了一个网页，
c - 收到 SIGCHLD 但尚未生成任何子进程
我有一个在 Linux 3.12 上运行的 C 程序。该程序产生几个子进程。其中一个进程会生成一个线程，该线程运行一段时间然后终止。当该子进程运行时，它会执行 epoll_wait()。 epoll_
swift - 收到 APN 时运行函数
我能够将 APNS 集成到我的应用程序中。现在我想在用户点击它或用户在使用应用程序时收到通知时处理通知。我使用下面的代码在收到通知时显示警报对话框: func application(applicat
javascript - 收到 501 错误
当我试图在浏览器上运行这段代码时，出现了以下错误。"错误响应错误代码:501消息:不支持的方法(“POST”)。错误码解释:501-服务器不支持该操作。" 浏览器控制台出现以下错误: "1.加载资源失

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c++ - MPI_Irecv 没有收到所有发送？