gpt4 book ai didi

澄清 "region cannot be closely nested inside ' 并行'区域"

转载 作者:行者123 更新时间:2023-12-03 13:22:49 25 4
gpt4 key购买 nike

我试图了解减少在 OpenMP 中的工作原理。
我有这个涉及减少的简单代码。

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
int N = 100;
int M = 200;
int O = 300;

double r2() {
return ((double) rand() / (double) RAND_MAX);
}

int main(void) {
double S = 0;
double *K = (double*) calloc(M * N, sizeof(double));
#pragma omp parallel for collapse(2)
{
for (int m = 0; m < M; m++) {
for (int n = 0; n < N; n++) {
#pragma omp for reduction(+:S)
for (int o = 0; o < O; o++) {
S += r2() - 0.25;
}
K[m * N + n] = S;
}
}
}
}
我收到此错误消息

Blockquote test.cc:30:1: error: region cannot be closely nested inside 'parallel for' region; perhaps you forget to enclose 'omp for' directive into a parallel region?#pragma omp for reduction(+:S)^


如果我这样做,它符合
#pragma omp parallel for reduction(+:S)
这是进行嵌套循环的正确方法吗?

编辑:
改变原来的问题。我希望并行和顺序代码具有相同的结果。
#pragma omp parallel for collapse(2)
for (int m = 0; m < M; m++) {
for (int n = 0; n < N; n++) {
#pragma omp for reduction(+:S)
for (int o = 0; o < O; o++) {
S += o;
}
K[m * N + n] = S;
}
}

最佳答案

重要 TL;DR rand is not thread safe :
来自 rand 手册页:

The function rand() is not reentrant or thread-safe, since it uses hidden state that is modified on each call.


用于多线程代码(例如) rand_r反而。

I am trying to understand how reduction works in OpenMP.


为了论证,我们假设 r2()将始终产生相同的值。
当一个人的代码有多个线程同时修改某个变量时,代码如下所示:
   double S = 0;
#pragma omp parallel
for (int o = 0; o < O; o++) {
S += r2() - 0.25;
}
变量 S 的更新存在竞争条件.要解决它,可以使用 OpenMP reduction子句,来自 OpenMP standard可以阅读:

The reduction clause can be used to perform some forms of recurrencecalculations (...) in parallel. For parallel and work-sharingconstructs, a private copy of each list item is created, one for eachimplicit task, as if the private clause had been used. (...) Theprivate copy is then initialized as specified above. At the end of theregion for which the reduction clause was specified, the original listitem is updated by combining its original value with the final valueof each of the private copies, using the combiner of the specifiedreduction-identifier.


在这种情况下,代码将如下所示:
    #pragma omp for reduction(+:S)
for (int o = 0; o < O; o++) {
S += r2() - 0.25;
}
但是,在您的完整代码中
#pragma omp parallel for collapse(2)
for (int m = 0; m < M; m++) {
for (int n = 0; n < N; n++) {
#pragma omp for reduction(+:S)
for (int o = 0; o < O; o++) {
S += r2() - 0.25;
}
K[m * N + n] = S;
}
}
您首先使用 #pragma omp for collapse(2) 划分两个外部循环的迭代。 ,然后您尝试使用不同的子句 #pragma omp for 再次划分最内层循环的迭代。这是不允许的。

Is this the right way to do a nested loop?


您可以执行以下并行化:
#pragma omp parallel for collapse(2) firstprivate (S)
for (int m = 0; m < M; m++) {
for (int n = 0; n < N; n++) {
for (int o = 0; o < O; o++) {
S += r2() - 0.25;
}
K[m * N + n] = S;
}
}
没有竞争条件,因为变量 S是私有(private)的。此外,在这种情况下,由于两个最外层循环的迭代在线程之间划分,每个线程都有唯一的一对 m。和 n迭代,因此每个线程将访问数组的唯一位置 K访问期间 K[m * N + n] .
但问题是并行化两个外部循环的版本不会产生与其顺序对应的相同结果。这是因为
        for (int o = 0; o < O; o++) {
S += r2() - 0.25;
}
K[m * N + n] = S;
在三个循环的所有迭代中添加隐式依赖项。 S 的值显式取决于迭代的顺序 m , no被执行。因此,如果将这些循环的迭代在线程之间划分,则 S 的值给定的 mn如果代码是按顺序或并行执行的,那将是不一样的。尽管如此,这可以通过仅并行化最内层循环并减少变量 S来解决。 :
for (int m = 0; m < M; m++) {
for (int n = 0; n < N; n++) {
#pragma omp parallel for reduction(+:S)
for (int o = 0; o < O; o++) {
S += r2() - 0.25;
}
K[m * N + n] = S;
}
}
如果您关心 S 的值,所有这些(当然)都很重要。 ,因为有人可能会争辩说,由于您使用的是产生随机值的函数,因此保持 S 值的顺序并不是最重要的。
带有线程安全随机生成器的版本
版本 1
#pragma omp parallel
{
unsigned int myseed = omp_get_thread_num();
#pragma omp for collapse(2)
for (int m = 0; m < M; m++) {
for (int n = 0; n < N; n++) {
for (int o = 0; o < O; o++) {
double r = ((double) rand_r(&myseed) / (double) RAND_MAX);
S += r - 0.25;
}
K[m * N + n] = S;
}
}
}
版本 2
double *K = (double*) calloc(M * N, sizeof(double));
for (int m = 0; m < M; m++) {
for (int n = 0; n < N; n++) {
#pragma omp parallel
{
unsigned int myseed = omp_get_thread_num();
#pragma omp for reduction(+:S)
for (int o = 0; o < O; o++) {
double r = ((double) rand_r(&myseed) / (double) RAND_MAX);
S += r - 0.25;
}
}
K[m * N + n] = S;
}
}
编辑:

Making a change in the original question. I want the parallel andsequential code to have the same result.


代替 :
#pragma omp parallel for collapse(2)
for (int m = 0; m < M; m++) {
for (int n = 0; n < N; n++) {
#pragma omp for reduction(+:S)
for (int o = 0; o < O; o++) {
S += o;
}
K[m * N + n] = S;
}
}
做:
    for (int m = 0; m < M; m++) {
for (int n = 0; n < N; n++) {
#pragma omp parallel for reduction(+:S)
for (int o = 0; o < O; o++) {
S += o;
}
K[m * N + n] = S;
}
}

关于澄清 "region cannot be closely nested inside ' 并行'区域",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66187738/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com