gpt4 book ai didi

c++ - #pragma omp parallel 和 #pragma omp parallel for 之间的区别

转载 作者:行者123 更新时间:2023-12-03 13:15:52 28 4
gpt4 key购买 nike

我是 OpenMP 的新手,我一直在尝试运行一个使用 OpenMP 添加两个数组的程序。在 OpenMP 教程中,我了解到,在 for 循环上使用 OpenMP 时,我们需要使用 #pragma omp parallel for。但我也用 #pragma omp parallel 尝试过同样的事情,它也给了我正确的输出。下面是我想要传达的内容的代码片段。

#pragma omp parallel for
{
for(int i=0;i<n;i++)
{
c[i]=a[i]+b[i];
}
}

 #pragma omp parallel
{
for(int i=0;i<n;i++)
{
c[i]=a[i]+b[i];
}
}

这两者有什么区别?

最佳答案

#pragma omp parallel :

将创建一个parallel region团队成员 threads ,其中每个线程将执行 parallel region 的整个代码块。附上。

来自OpenMP 5.1人们可以阅读更正式的描述:

When a thread encounters a parallel construct, a team of threads iscreated to execute the parallel region (..). Thethread that encountered the parallel construct becomes the primarythread of the new team, with a thread number of zero for the durationof the new parallel region. All threads in the new team, including theprimary thread, execute the region. Once the team is created, thenumber of threads in the team remains constant for the duration ofthat parallel region.

:

#pragma omp parallel for

将创建一个parallel region (如前所述),以及threads对于该区域,将使用 default chunk size 分配它所包含的循环的迭代。 ,以及default schedule 通常 static 。但请记住,default schedule OpenMP 的不同具体实现可能会有所不同标准。

来自OpenMP 5.1您可以阅读更正式的描述:

The worksharing-loop construct specifies that the iterations of one ormore associated loops will be executed in parallel by threads in theteam in the context of their implicit tasks. The iterations aredistributed across threads that already exist in the team that isexecuting the parallel region to which the worksharing-loop regionbinds.

Moreover ,

The parallel loop construct is a shortcut for specifying a parallelconstruct containing a loop construct with one or more associatedloops and no other statements.

或者非正式地,#pragma omp parallel for是构造函数 #pragma omp parallel 的组合与 #pragma omp for 。就您而言,这意味着:

#pragma omp parallel for
{
for(int i=0;i<n;i++)
{
c[i]=a[i]+b[i];
}
}

在语义和逻辑上都与:

#pragma omp parallel
{
#pragma omp for
for(int i=0;i<n;i++)
{
c[i]=a[i]+b[i];
}
}

TL;DR: 在您的示例中,使用 #pragma omp parallel for循环将在线程之间并行化(即,循环迭代将在线程之间划分),而 #pragma omp parallel 所有线程将(并行)执行所有循环迭代。

为了使其更具说明性,使用 4线程#pragma omp parallel ,会产生如下结果:

enter image description here

#pragma omp parallel forchunk_size=1静态 schedule会导致类似的结果:

enter image description here

从代码角度来看,循环将转换为逻辑上类似于:

for(int i=omp_get_thread_num(); i < n; i+=omp_get_num_threads())
{
c[i]=a[i]+b[i];
}

哪里omp_get_thread_num()

The omp_get_thread_num routine returns the thread number, within thecurrent team, of the calling thread.

omp_get_num_threads()

Returns the number of threads in the current team. In a sequentialsection of the program omp_get_num_threads returns 1.

或者换句话说,for(int i = THREAD_ID; i < n; i += TOTAL_THREADS) 。与THREAD_ID范围从 0TOTAL_THREADS - 1 ,和TOTAL_THREADS表示在并行区域上创建的团队线程总数。

I have learned that we need to use #pragma omp parallel for whileusing OpenMP on the for loop. But I have also tried the same thingwith #pragma omp parallel and it is also giving me the correct output.

它会为您提供相同的输出,因为在您的代码中:

 c[i]=a[i]+b[i];

数组a和数组b只能读取,数组 c[i]是唯一被更新的,其值不取决于迭代次数 i将被执行。尽管如此,与 #pragma omp parallel for每个线程都会更新自己的i ,而 #pragma omp parallel线程将更新相同的 i s,因此覆盖彼此的值(value)观。

现在尝试使用以下代码执行相同的操作:

#pragma omp parallel for
{
for(int i=0;i<n;i++)
{
c[i]= c[i] + a[i] + b[i];
}
}

#pragma omp for
{
for(int i=0;i<n;i++)
{
c[i] = c[i] + a[i] + b[i];
}
}

您会立即注意到差异。

关于c++ - #pragma omp parallel 和 #pragma omp parallel for 之间的区别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65247801/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com