c++ - OpenMP 的扩展性不好(缓存争用？)-6ren

c++ - OpenMP 的扩展性不好(缓存争用？)

转载作者：行者123 更新时间：2023-11-30 05:27:22

我想了解更多有关 OpenMP 和缓存争用的信息，因此我编写了一个简单的程序来更好地了解它的工作原理。对于简单的 vector 添加，我的线程缩放比例很差，但我不明白为什么。这是我的程序:

#include <iostream>
#include <omp.h>
#include <vector>

using namespace std;

int main(){

    // Initialize stuff
    int nuElements=20000000; // Number of elements
    int i;
    vector<int> x, y, z;
    x.assign(nuElements,0);
    y.assign(nuElements,0);
    z.assign(nuElements,0);
    double start; // Timer

    for (i=0;i<nuElements;++i){
       x[i]=i;
       y[i]=i;
    }    

    // Increase the threads by 1 every time, and add the two vectors  
    for (int t=1;t<5;++t){

        // Re-set z vector values
        z.clear();

        // Set number of threads for this iteration
        omp_set_num_threads(t);

        // Start timer
        start=omp_get_wtime();

        // Parallel for
#pragma omp parallel for
        for (i=0;i<nuElements;++i)
        {
            z[i]=x[i]+y[i];
        }
        // Print wall time
        cout<<"Time for "<<omp_get_max_threads()<<" thread(s) : "<<omp_get_wtime()-start<<endl;
    }
    return 0;
}

运行它会产生以下输出:

Time for 1 thread(s) : 0.020606
Time for 2 thread(s) : 0.022671
Time for 3 thread(s) : 0.026737
Time for 4 thread(s) : 0.02825

我用这个命令编译:clang++ -O3 -std=c++11 -fopenmp=libiomp5 test_omp.cpp

如您所见，随着线程数量的增加，缩放比例变得更糟。我在 4 核 Intel-i7 处理器上运行它。有谁知道发生了什么事吗？

最佳答案

您受限于内存带宽，而非 CPU 速度。如果您所做的只是添加和复制，那么只需要一个 CPU 就可以让您的内存保持繁忙，因此添加更多内核没有帮助。

如果您想看到添加更多线程的好处，请尝试在足够小以适合 L1 或 L2 缓存的内存上执行更复杂的操作。

关于c++ - OpenMP 的扩展性不好(缓存争用？)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37427032/

文章推荐： C++ typedef 与类冲突

文章推荐： c++ - 多个文件中的多个类 - C++/Arduino

文章推荐： c++ - 双线性图像采样不可重现的访问冲突

文章推荐： facebook - IOS 6 中的 SLRequest

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c++ - OpenMP 的扩展性不好(缓存争用？)