gpt4 book ai didi

c++ - boost .MPI : What's received isn't what was sent!

转载 作者:太空狗 更新时间:2023-10-29 19:38:15 25 4
gpt4 key购买 nike

我对使用 Boost MPI 比较陌生。我已经安装了库,编译了代码,但出现了一个非常奇怪的错误——从节点接收到的一些整数数据不是主节点发送的。这是怎么回事?

我正在使用 boost 版本 1.42.0,使用 mpic++ 编译代码(在一个集群上包装 g++,在另一个集群上包装 icpc)。下面是一个简化的示例,包括输出。

代码:

#include <iostream>
#include <boost/mpi.hpp>

using namespace std;
namespace mpi = boost::mpi;

class Solution
{
public:
Solution() :
solution_num(num_solutions++)
{
// Master node's constructor
}

Solution(int solutionNum) :
solution_num(solutionNum)
{
// Slave nodes' constructor.
}

int solutionNum() const
{
return solution_num;
}

private:
static int num_solutions;
int solution_num;
};

int Solution::num_solutions = 0;

int main(int argc, char* argv[])
{
// Initialization of MPI
mpi::environment env(argc, argv);
mpi::communicator world;

if (world.rank() == 0)
{
// Create solutions
int numSolutions = world.size() - 1; // One solution per slave
vector<Solution*> solutions(numSolutions);
for (int sol = 0; sol < numSolutions; ++sol)
{
solutions[sol] = new Solution;
}

// Send solutions
for (int sol = 0; sol < numSolutions; ++sol)
{
world.isend(sol + 1, 0, false); // Tells the slave to expect work
cout << "Sending solution no. " << solutions[sol]->solutionNum() << " to node " << sol + 1 << endl;
world.isend(sol + 1, 1, solutions[sol]->solutionNum());
}

// Retrieve values (solution numbers squared)
vector<double> values(numSolutions, 0);
for (int i = 0; i < numSolutions; ++i)
{
// Get values for each solution
double value = 0;
mpi::status status = world.recv(mpi::any_source, 2, value);
int source = status.source();

int sol = source - 1;
values[sol] = value;
}
for (int i = 1; i <= numSolutions; ++i)
{
world.isend(i, 0, true); // Tells the slave to finish
}

// Output the solutions numbers and their squares
for (int i = 0; i < numSolutions; ++i)
{
cout << solutions[i]->solutionNum() << ", " << values[i] << endl;
delete solutions[i];
}
}
else
{
// Slave nodes merely square the solution number
bool finished;
mpi::status status = world.recv(0, 0, finished);
while (!finished)
{
int solNum;
world.recv(0, 1, solNum);
cout << "Node " << world.rank() << " receiving solution no. " << solNum << endl;

Solution solution(solNum);
double value = static_cast<double>(solNum * solNum);
world.send(0, 2, value);

status = world.recv(0, 0, finished);
}

cout << "Node " << world.rank() << " finished." << endl;
}

return EXIT_SUCCESS;
}

在 21 个节点(1 个主节点,20 个从节点)上运行它会产生:

Sending solution no. 0 to node 1
Sending solution no. 1 to node 2
Sending solution no. 2 to node 3
Sending solution no. 3 to node 4
Sending solution no. 4 to node 5
Sending solution no. 5 to node 6
Sending solution no. 6 to node 7
Sending solution no. 7 to node 8
Sending solution no. 8 to node 9
Sending solution no. 9 to node 10
Sending solution no. 10 to node 11
Sending solution no. 11 to node 12
Sending solution no. 12 to node 13
Sending solution no. 13 to node 14
Sending solution no. 14 to node 15
Sending solution no. 15 to node 16
Sending solution no. 16 to node 17
Sending solution no. 17 to node 18
Sending solution no. 18 to node 19
Sending solution no. 19 to node 20
Node 1 receiving solution no. 0
Node 2 receiving solution no. 1
Node 12 receiving solution no. 19
Node 3 receiving solution no. 19
Node 15 receiving solution no. 19
Node 13 receiving solution no. 19
Node 4 receiving solution no. 19
Node 9 receiving solution no. 19
Node 10 receiving solution no. 19
Node 14 receiving solution no. 19
Node 6 receiving solution no. 19
Node 5 receiving solution no. 19
Node 11 receiving solution no. 19
Node 8 receiving solution no. 19
Node 16 receiving solution no. 19
Node 19 receiving solution no. 19
Node 20 receiving solution no. 19
Node 1 finished.
Node 2 finished.
Node 7 receiving solution no. 19
0, 0
1, 1
2, 361
3, 361
4, 361
5, 361
6, 361
7, 361
8, 361
9, 361
10, 361
11, 361
12, 361
13, 361
14, 361
15, 361
16, 361
17, 361
18, 361
19, 361
Node 6 finished.
Node 3 finished.
Node 17 receiving solution no. 19
Node 17 finished.
Node 10 finished.
Node 12 finished.
Node 8 finished.
Node 4 finished.
Node 15 finished.
Node 18 receiving solution no. 19
Node 18 finished.
Node 11 finished.
Node 13 finished.
Node 20 finished.
Node 16 finished.
Node 9 finished.
Node 19 finished.
Node 7 finished.
Node 5 finished.
Node 14 finished.

因此,当主节点将 0 发送到节点 1、1 到节点 2、2 到节点 3 等时,大多数从节点(出于某种原因)接收到数字 19。因此,不是从 0 生成数字的平方到 19,我们得到 0 的平方、1 的平方和 19 的平方 18 次!

提前感谢任何可以解释这一点的人。

艾伦

最佳答案

好的,我想我找到了答案,这需要一些底层 C 风格 MPI 调用的知识。 Boost 的“isend”函数本质上是“MPI_Isend”的包装器,它不会保护用户免于了解有关“MPI_Isend”如何工作的一些细节。

“MPI_Isend”的一个参数是指向包含您要发送的信息的缓冲区的指针。但是,重要的是,在您知道已收到消息之前,不能重用此缓冲区。因此请考虑以下代码:

// Get solution numbers from the solutions and store in a vector
vector<int> solutionNums(numSolutions);
for (int sol = 0; sol < numSolutions; ++sol)
{
solutionNums[sol] = solutions[sol]->solutionNum();
}

// Send solution numbers
for (int sol = 0; sol < numSolutions; ++sol)
{
world.isend(sol + 1, 0, false); // Indicates that we have not finished, and to expect a solution representation
cout << "Sending solution no. " << solutionNums[sol] << " to node " << sol + 1 << endl;
world.isend(sol + 1, 1, solutionNums[sol]);
}

这非常有效,因为每个解决方案编号在内存中都有自己的位置。现在考虑以下小调整:

// Create solutionNum array
vector<int> solutionNums(numSolutions);
for (int sol = 0; sol < numSolutions; ++sol)
{
solutionNums[sol] = solutions[sol]->solutionNum();
}

// Send solutions
for (int sol = 0; sol < numSolutions; ++sol)
{
int solNum = solutionNums[sol];
world.isend(sol + 1, 0, false); // Indicates that we have not finished, and to expect a solution representation
cout << "Sending solution no. " << solNum << " to node " << sol + 1 << endl;
world.isend(sol + 1, 1, solNum);
}

现在底层的“MPI_Isend”调用提供了一个指向 solNum 的指针。不幸的是,每次循环都会覆盖这部分内存,因此虽然看起来数字 4 已发送到节点 5,但在实际发送时,该内存位置的新内容(例如 19)而是通过了。

现在考虑原始代码:

// Send solutions
for (int sol = 0; sol < numSolutions; ++sol)
{
world.isend(sol + 1, 0, false); // Tells the slave to expect work
cout << "Sending solution no. " << solutions[sol]->solutionNum() << " to node " << sol + 1 << endl;
world.isend(sol + 1, 1, solutions[sol]->solutionNum());
}

这里我们传递了一个临时的。同样,每次循环时,内存中这个临时文件的位置都会被覆盖。同样,错误的数据被发送到从节点。

碰巧的是,我已经能够重组我的“真实”代码以使用“发送”而不是“isend”。但是,如果我以后需要使用“isend”,我会更加小心!

关于c++ - boost .MPI : What's received isn't what was sent!,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4024940/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com