openmp - 必须在最后订购吗？-6ren

openmp - 必须在最后订购吗？

转载作者：行者123 更新时间：2023-12-03 21:31:28

26

4

#pragma omp parallel for ordered
for (int i = 0; i < n; ++i) {
  ... code happens nicely in parallel here ...
  #pragma omp ordered
  {
    .. one at a time in order of i, as expected, good ...
  }
  ... single threaded here but I expected parallel ...
}

我希望下一个线程在该线程离开有序部分后立即进入有序部分。但是下一个线程只有在 for 循环体结束时才进入有序部分。所以有序部分结束后的代码是串行的。

OpenMP 4.0 手册包含:

The ordered construct specifies a structured block in a loop region that will be executed in the order of the loop iterations. This sequentializes and orders the code within an ordered region while allowing code outside the region to run in parallel.

我添加粗体的地方。我正在阅读“外部”以在订购的部分结束后包括在内。

这是预期的吗？订购的部分实际上必须在最后吗？

我搜索了一个答案，并确实找到了另一个地方，有人在将近 2 年前观察到类似的情况:https://stackoverflow.com/a/32078625/403310 :

Testing with gfortran 5.2, it appears everything after the ordered region is executed in order for each loop iteration, so having the ordered block at the beginning of the loop leads to serial performance while having the ordered block at the end of the loop does not have this implication as the code before the block is parallelized. Testing with ifort 15 is not as dramatic but I would still recommend structuring your code so your ordered block occurs after any code than needs parallelization in a loop iteration rather than before.

我在 Ubuntu 16.04 上使用 gcc 5.4.0。

非常感谢。

最佳答案

ordered 区域不需要放在最后。您观察到的行为取决于实现，并且是 libgomp(来自 gcc 的 OpenMP 运行时库)中的一个已知缺陷。我想标准可以容忍这种行为，但显然不是最优的。

从技术上讲，编译器会根据注释生成以下代码:

#pragma omp parallel for ordered
for (int i = 0; i < n; ++i) {
  ... code happens nicely in parallel here ...
  GOMP_ordered_start();
  {
    .. one at a time in order of i, as expected, good ...
  }
  GOMP_ordered_end();
  ... single threaded here but I expected parallel ...
  GOMP_loop_ordered_static_next();
}

不幸的是，GOMP_ordered_end 是 implemented as follows :

/* This function is called by user code when encountering the end of an
   ORDERED block.  With the current ORDERED implementation there's nothing
   for us to do.

   However, the current implementation has a flaw in that it does not allow
   the next thread into the ORDERED section immediately after the current
   thread exits the ORDERED section in its last iteration.  The existance
   of this function allows the implementation to change.  */

void
GOMP_ordered_end (void)
{
}

我推测，这从来都不是一个重要的用例，因为 ordered 可能通常用于以下方面:

#pragma omp parallel for ordered
for (...) {
    result = expensive_computation()
    #pragma omp ordered
    {
        append(results, result);
    }
}

来自英特尔编译器的 OpenMP 运行时没有此缺陷。

关于openmp - 必须在最后订购吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43540605/

26

4

0

文章推荐： knockout.js - 使用 Knockout Validator 添加 CSS 类

文章推荐： asp.net-mvc - DataAnnotations 或在服务中手动验证？

文章推荐： pandas - Pandas 箱线图中的 mustache 究竟说明了什么？

openmp - OpenMP 中的高斯消除
OpenMP 中的高斯消除。我是 openmp 的新手，想知道我是否在正确的地方使用了我的编译指示和屏障。我的 x 值每次都不同。他们应该是一样的吗？？ #include int num; doub
openmp - OpenMP 和矢量化之间的比较
给定一个示例函数(示例在下面给出)，for 循环可以使用 OpenMP 并行化或使用矢量化进行矢量化(假设编译器执行矢量化)。示例 void function(float* a, float* b,
openmp - OpenMP 中的原子性和关键性有什么区别？
OpenMP 中原子和关键之间有什么区别？我能做到 #pragma omp atomic g_qCount++; 但这和不一样吗 #pragma omp critical g_qCount++; ？
openmp - 给定依赖图生成 OpenMP 代码
我有一个关于如何在您考虑特定依赖关系图时生成 OpenMP 伪代码的问题。所以假设我们有这个特定的图表: 解决方案可能是这样的: #pragma omp parallel {
openmp - 使用 OpenMP 进行缩减
我正在尝试使用 openmp 计算二维矩阵的平均值。这个二维矩阵实际上是一个图像。我正在对数据进行线程分割。例如，如果我有 N线程比我处理行/N thread0 的行数，等等。我的问题是:我可以
openmp - 如何统计测量程序中的 OpenMP 性能？
我想统计测量与 OpenMP 并行化的程序的性能。我选择在执行并行算法的测试应用程序中编写循环 MAX_EXPERIMENTS次并将时间测量报告到文件中。问题解决方案似乎比提取外部循环上方的并行编译
openmp - OpenMP 中的循环顺序折叠性能建议
我找到了 Intel's performance suggestion on Xeon Phi关于 OpenMP 中的 Collapse 子句。 #pragma omp parallel for co
openmp - 如何使用 OpenMP 并行化数组移位？
如何使用 OpenMP 并行化数组移位？我尝试了一些方法，但在以下示例中没有得到任何准确的结果(该示例旋转 Carteira 对象数组的元素，用于排列算法): void rotaciona(int
openmp - 使用 openmp 并行执行函数
我有一系列对几个独立函数的调用。 func1(arg); func2(arg); func3(arg); 我想并行执行它们，而不是串行执行它们。我目前正在使用 #pragma omp parallel
openmp - openmp 中的 Dependent 子句不尊重声明的依赖
我正在尝试使用 openmp 任务来安排基本 jacobi2d 计算的平铺执行。在 jacobi2d 中，依赖于 A(i,j) 从 A(i, j) A(i-1, j) A(i+1, j) A(i, j
openmp - 在 OpenMP 中，如何让每个内核运行一个线程？
我在 3 天前开始使用 OpenMP。我想知道如何使用#pragma使每个内核运行一个线程。详细信息:- int ncores = omp_get_num_procs();for(i = 0; i <
openmp - OpenMP 中的 Schedule 子句
我有一段代码(它是应用程序的一部分)，我正在尝试使用 OpenMP 对其进行优化，正在尝试各种调度策略。就我而言，我注意到 schedule(RUNTIME)条款比其他条款有优势(我没有指定 chun
openmp - OpenMP 或 MPI 哪个更容易学习和调试？
我有一个数字运算 C/C++ 应用程序。它基本上是不同数据集的主循环。我们可以使用 openmp 和 mpi 访问一个 100 节点的集群。我想加速应用程序，但我是 mpi 和 openmp 的绝对新
openmp - OpenMP 分发中的 SECTIONS 指令如何工作？
在 OpenMP 中使用ompsections时，线程会被分配到sections内的 block ，还是每个线程会被分配到每个section？当nthreads == 3时: #pragma omp
openmp - cython openmp 单，屏障
我正在尝试在 cython 中使用 openmp。我需要在 cython 中做两件事: i) 在我的 cython 代码中使用 #pragma omp single{} 作用域。 ii) 使用#pra
openmp - 为什么 OpenMP 不能在 for 循环内有部分？
我正在尝试通过将循环的每次迭代作为 OpenMP 部分来并行化 OpenMP 中基于范围的 for 循环。我想这样做: #pragma omp parallel sections { for ( au
openmp - cython openmp 单，屏障
我正在尝试在 cython 中使用 openmp。我需要在 cython 中做两件事: i) 在我的 cython 代码中使用 #pragma omp single{} 作用域。 ii) 使用#pra
openmp - 将并行程序转换为集群程序。从 OpenMP 到？
我想编写一个代码转换器，它采用基于 OpenMP 的并行程序并在集群上运行它。我该如何解决这个问题？我使用哪些库？如何为此设置小型集群？我发现很难在 Internet 上找到有关集群计算的好 Ma
c++ - OpenMP - OpenMP 'for' 语句中的索引变量必须具有带符号的整数类型
我是 OpenMP 的新手。我正在尝试为 for 循环使用多个内核，但出现此编译错误: “错误 C3016:'x':OpenMP 'for' 语句中的索引变量必须具有带符号的整数类型”。我知道 Op
openmp - 使用 Qt creator 时如何开启 OpenMP
如果我使用 VS 2010 编译器从 Qt Creator 构建项目，我如何启用 OpenMP(从 Visual Studio 构建时，您只需启用该功能)谢谢最佳答案在 .pro 文件中尝试下一步

首页

博学

6Ren·AI

商城

openmp - 必须在最后订购吗？