gpt4 book ai didi

c++ - 有人可以解释为什么使用 OpenMP 部分比单线程运行得慢吗?

转载 作者:行者123 更新时间:2023-11-28 05:01:24 26 4
gpt4 key购买 nike

我是并行编程的新手。根据示例代码,有人可以解释为什么使用 OpenMP 部分比单线程运行得慢吗?有什么改进的建议吗?

#include<iostream>
#include <vector>
#include <chrono>
#include <numeric>
#include<omp.h>

using namespace std;

int Calculation_1(int A, int B);
int Calculation_2(int A, int B);
int Calculation_3(int A, int B);
int Calculation_4(int A, int B);

int main() {

vector<int>W;
vector<int>X;
vector<int>Y;
vector<int>Z;

chrono::steady_clock::time_point begin1 = std::chrono::steady_clock::now();
omp_set_num_threads(4);

#pragma omp parallel
{
#pragma omp sections nowait
{
#pragma omp section
{
W.push_back(Calculation_1(5, 5));
}
#pragma omp section
{
X.push_back(Calculation_2(5, 5));
}
#pragma omp section
{
Y.push_back(Calculation_3(5, 5));
}
#pragma omp section
{
Z.push_back(Calculation_4(5, 5));
}
}
}

cout << "Parallel = " << accumulate(W.begin(), W.end(), 0) + accumulate(X.begin(), X.end(), 0) + accumulate(Y.begin(), Y.end(), 0) + accumulate(Z.begin(), Z.end(), 0) << endl;;
chrono::steady_clock::time_point end1 = std::chrono::steady_clock::now();
cout << "Time difference = " << std::chrono::duration_cast<std::chrono::nanoseconds>(end1 - begin1).count() << std::endl;
//Clear vector
W.clear();
X.clear();
Y.clear();
Z.clear();

////Sigle
chrono::steady_clock::time_point begin2 = std::chrono::steady_clock::now();

W.push_back(Calculation_1(5, 5));
X.push_back(Calculation_2(5, 5));
Y.push_back(Calculation_3(5, 5));
Z.push_back(Calculation_4(5, 5));

cout << "single = " << accumulate(W.begin(), W.end(), 0) + accumulate(X.begin(), X.end(), 0) + accumulate(Y.begin(), Y.end(), 0) + accumulate(Z.begin(), Z.end(), 0) << endl;
chrono::steady_clock::time_point end2 = std::chrono::steady_clock::now();
cout << "Time difference = " << std::chrono::duration_cast<std::chrono::nanoseconds>(end2 - begin2).count() << std::endl;

cin.get();

return 0;

}

int Calculation_1(int A, int B) {
return A + B;
}
int Calculation_2(int A, int B) {
return A + B;
}
int Calculation_3(int A, int B) {
return A + B;
}
int Calculation_4(int A, int B) {
return A + B;
}

结果是:并行 = 40时间 = 9168172

单例 = 40时间 225580

并行比单个慢 40 倍。

//我也尝试根据建议(下面的代码)将许多数字插入 vector 中。结果是:(并行比单机慢9倍)。

并行时间 = 12907862

单例时间 = 1334519

chrono::steady_clock::time_point begin1 = std::chrono::steady_clock::now();
omp_set_num_threads(2);

#pragma omp parallel
{
#pragma omp sections nowait
{
#pragma omp section
{
for (int i = 0; i < 100000; i++) {
X.push_back(i);
}
}
#pragma omp section
{
for (int j = 0; j < 100000; j++) {
Y.push_back(j);
}
}
}
}

cout << "Parallel = " << accumulate(X.begin(), X.end(), 0) + accumulate(Y.begin(), Y.end(), 0) << endl;;
chrono::steady_clock::time_point end1 = std::chrono::steady_clock::now();
cout << "Time difference = " << std::chrono::duration_cast<std::chrono::nanoseconds>(end1 - begin1).count() << std::endl;
//Clear vector
X.clear();
Y.clear();

////Sigle
chrono::steady_clock::time_point begin2 = std::chrono::steady_clock::now();

for (int i = 0; i < 100000; i++) {
X.push_back(i);
}
for (int j = 0; j < 100000; j++) {
Y.push_back(j);
}


cout << "single = " << accumulate(X.begin(), X.end(), 0) + accumulate(Y.begin(), Y.end(), 0) << endl;
chrono::steady_clock::time_point end2 = std::chrono::steady_clock::now();
cout << "Time difference = " << std::chrono::duration_cast<std::chrono::nanoseconds>(end2 - begin2).count() << std::endl;

非常感谢,

最佳答案

另请注意,对于这种简单的计算,与仅在单个线程中计算它们相比,向线程发送垃圾邮件的时间成本 可能会更长。同样正如 user0042 所说,如果您的计算机垃圾邮件线程多于核心,他们将开始调度资源(核心)并共享它们,进入和退出也会减慢计算。

关于c++ - 有人可以解释为什么使用 OpenMP 部分比单线程运行得慢吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45848608/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com