c++ - #pragma omp parallel 和 #pragma omp parallel for 之间的区别-6ren

c++ - #pragma omp parallel 和 #pragma omp parallel for 之间的区别

转载作者：行者123 更新时间：2023-12-03 13:15:52

29

4

我是 OpenMP 的新手，我一直在尝试运行一个使用 OpenMP 添加两个数组的程序。在 OpenMP 教程中，我了解到，在 for 循环上使用 OpenMP 时，我们需要使用 #pragma omp parallel for。但我也用 #pragma omp parallel 尝试过同样的事情，它也给了我正确的输出。下面是我想要传达的内容的代码片段。

#pragma omp parallel for
{
      for(int i=0;i<n;i++)
       {  
            c[i]=a[i]+b[i];
       }
}

和

 #pragma omp parallel
{
      for(int i=0;i<n;i++)
       {  
            c[i]=a[i]+b[i];
       }
}

这两者有什么区别？

最佳答案

#pragma omp parallel :

将创建一个parallel region团队成员 threads ，其中每个线程将执行 parallel region 的整个代码块。附上。

来自OpenMP 5.1人们可以阅读更正式的描述:

When a thread encounters a parallel construct, a team of threads iscreated to execute the parallel region (..). Thethread that encountered the parallel construct becomes the primarythread of the new team, with a thread number of zero for the durationof the new parallel region. All threads in the new team, including theprimary thread, execute the region. Once the team is created, thenumber of threads in the team remains constant for the duration ofthat parallel region.

:

#pragma omp parallel for

将创建一个parallel region (如前所述)，以及threads对于该区域，将使用 default chunk size 分配它所包含的循环的迭代。，以及default schedule 通常 static 。但请记住，default schedule OpenMP 的不同具体实现可能会有所不同标准。

来自OpenMP 5.1您可以阅读更正式的描述:

The worksharing-loop construct specifies that the iterations of one ormore associated loops will be executed in parallel by threads in theteam in the context of their implicit tasks. The iterations aredistributed across threads that already exist in the team that isexecuting the parallel region to which the worksharing-loop regionbinds.

Moreover ,

The parallel loop construct is a shortcut for specifying a parallelconstruct containing a loop construct with one or more associatedloops and no other statements.

或者非正式地，#pragma omp parallel for是构造函数 #pragma omp parallel 的组合与 #pragma omp for 。就您而言，这意味着:

#pragma omp parallel for
{
      for(int i=0;i<n;i++)
       {  
            c[i]=a[i]+b[i];
       }
}

在语义和逻辑上都与:

#pragma omp parallel
{
      #pragma omp for
      for(int i=0;i<n;i++)
       {  
            c[i]=a[i]+b[i];
       }
}

TL;DR: 在您的示例中，使用 #pragma omp parallel for循环将在线程之间并行化(即，循环迭代将在线程之间划分)，而 #pragma omp parallel 所有线程将(并行)执行所有循环迭代。

为了使其更具说明性，使用 4线程#pragma omp parallel ，会产生如下结果:

而#pragma omp parallel for与 chunk_size=1和静态 schedule会导致类似的结果:

从代码角度来看，循环将转换为逻辑上类似于:

for(int i=omp_get_thread_num(); i < n; i+=omp_get_num_threads())
{  
    c[i]=a[i]+b[i];
}

哪里omp_get_thread_num()

The omp_get_thread_num routine returns the thread number, within thecurrent team, of the calling thread.

和omp_get_num_threads()

Returns the number of threads in the current team. In a sequentialsection of the program omp_get_num_threads returns 1.

或者换句话说，for(int i = THREAD_ID; i < n; i += TOTAL_THREADS) 。与THREAD_ID范围从 0至TOTAL_THREADS - 1 ，和TOTAL_THREADS表示在并行区域上创建的团队线程总数。

I have learned that we need to use #pragma omp parallel for whileusing OpenMP on the for loop. But I have also tried the same thingwith #pragma omp parallel and it is also giving me the correct output.

它会为您提供相同的输出，因为在您的代码中:

 c[i]=a[i]+b[i];

数组a和数组b只能读取，数组 c[i]是唯一被更新的，其值不取决于迭代次数 i将被执行。尽管如此，与 #pragma omp parallel for每个线程都会更新自己的i ，而 #pragma omp parallel线程将更新相同的 i s，因此覆盖彼此的值(value)观。

现在尝试使用以下代码执行相同的操作:

#pragma omp parallel for
{
      for(int i=0;i<n;i++)
       {  
            c[i]= c[i] + a[i] + b[i];
       }
}

和

#pragma omp for
{
      for(int i=0;i<n;i++)
       {  
            c[i] = c[i] + a[i] + b[i];
       }
}

您会立即注意到差异。

关于c++ - #pragma omp parallel 和 #pragma omp parallel for 之间的区别，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65247801/

29

4

0

文章推荐： reactjs - 我们可以将点击处理程序附加到自定义子组件吗

文章推荐： reactjs - react native 开始给出无效的正则表达式无效错误

oracle - 在 Oracle 中，PARALLEL 被广泛使用。 PARALLEL、PARALLEL(8)、PARALLEL(a,8)有什么区别？
在 Oracle 中，PARALLEL 被广泛使用。提示 PARALLEL、PARALLEL(8) 和 PARALLEL(a,8) 有什么区别。如何选择最佳的查询提示？ SELECT /*+ PARA
parallel-processing - OMP : What is the difference between OMP PARALLEL DO and OMP DO (Without parallel directive at all)
好的，我希望以前没有问过这个问题，因为在搜索中很难找到。我查看了 F95 手册，但仍然觉得这很模糊: For the simple case of: DO i=0,99 END DO 我正
parallel-processing - GNU parallel 有两个参数
我有一个 C-shell 脚本，其中有一个名为 $hosts_string 的变量，格式为: host1,host2,...,hostN 我还有一个名为 $chrs_string 的变量，其形式为:
parallel-processing - Gnu平行: nested parallelism
是否可以从由gnu parallel产生的脚本的多次运行中调用gnu parallel？我有一个python脚本，可以运行100个顺序顺序迭代，并且在每次迭代中的某处，并行计算4个值(使用gnu p
gnu-parallel - GNU Parallel - 多个命令
我想在几个输入上运行几个长时间运行的进程。例如。: solver_a problem_1 solver_b problem_1 ... solver_b problem_18 solver_c pro
delphi - Parallel.For 和 Parallel.For 之间有区别吗？
TParallel.&For 和 TParallel.For 之间有区别吗？两者都可以在 Delphi 10 Seattle 中编译。那么我应该坚持哪一个呢？最佳答案 TParallel.&For
parallel-processing - Julia Parallel 宏似乎不起作用
我第一次使用 julia 进行并行计算.我有点头疼。所以假设我开始 julia如下:julia -p 4 .然后我为所有处理器声明 a 函数，然后将它与 pmap 一起使用还有@parallel fo
parallel-processing - "embarrassingly parallel"短语的来源
关闭。这个问题是off-topic .它目前不接受答案。想改善这个问题吗？ Update the question所以它是 on-topic对于堆栈溢出。 10年前关闭。 Improve this
c# - Parallel.For 与 Parallel.Invoke
我有一堆相互排斥的方法，因此可以并行运行。有这样做的好方法吗？到目前为止，我有以下两种实现方式，但我不确定是否应该选择其中一种。使用 Parallel.For : Parallel.For(0, 2
parallel-processing - 使用 GNU parallel 并行化具有各种参数的脚本
我对并行运行脚本很感兴趣，并且我已经开始查看 GNU 并行工具，但是我遇到了一些麻烦。我的脚本 doSomething 有 3 个参数，我想在参数的不同值上并行运行脚本。我该怎么做？我试过:para
parallel-processing - 使用 GNU parallel 在多核上运行并行作业
我需要在多核(和多线程)机器上运行多个作业。我正在使用 GNU Parallel utility跨核心分配作业以加速任务。要执行的命令在名为“命令”的文件中可用。我使用以下命令运行 GNU Paral
parallel-processing - 如何使用 gnu-parallel 处理具有两个输入的脚本？
我正在尝试使用如下两个输入运行 Python 脚本。我得到了大约 300 个这两个输入，所以我想知道是否有人可以建议如何并行运行它们。单次运行看起来像: python stable.py KOG_1
gnu-parallel - 如何使用 "GNU parallel"在多个目录中执行一个命令？
每天我都必须更新一堆存储库，并在其中一些中执行另一个命令(来自 CARTON，Perl 模块依赖管理器)。我总是使用循环来执行此操作，但我想与并行执行GNU 并行如果可能，但我不太了解它的tuto
parallel-processing - @parallel 和 pmap 到底有什么区别？
正如标题所说:@parallel 之间究竟有什么区别？和 pmap ?我的意思不是明显的一个是循环的宏，另一个适用于函数，我的意思是它们的实现究竟有什么不同，我应该如何使用这些知识在它们之间进行选择？
parallel-processing - Windows Azure : Parallelization of the code
我有一些矩阵乘法运算。我想通过多个处理器并行执行这些操作。这可以使用 MPI(消息传递接口(interface))在高性能计算集群上完成。同样，我可以使用多个辅助角色在云中进行一些并行化吗？有什么办
python - 为什么joblib.Parallel()比非并行计算花费更多的时间？ Parallel()的运行速度是否应该比非并行计算快？
joblib模块提供了一个简单的帮助程序类，以使用多处理并行编写循环的循环。这段代码使用列表推导来完成这项工作： import time from math import sqrt from job
c openmp parallel for inside a parallel region
我的问题是这样的one .但我想做一些不同的事情... 例如，在我的并行区域内，我想在 4 个线程上运行我的代码。当每个线程进入 for 循环时，我想在 8 个线程上运行我的代码。像 #pramga
parallel-processing - ipython 笔记本 : how to parallelize external script
我正在尝试使用 ipython 并行库中的并行计算。但是我对此知之甚少，而且我发现很难从对并行计算一无所知的人那里阅读该文档。有趣的是，我发现的所有教程都只是重复使用文档中的示例，并使用相同的解释，
parallel-processing - Gradle : Run subproject's tasks in parallel
我的项目结构看起来像 Root + subproj1 + subproj2 在每个子项目中定义了自己的任务 run(){}。我想要做的是从 Root 项目的运行任务并行运行 :subpro
parallel-processing - Parallel.ForEach 应该在 DB 调用中使用吗？
我有一个 Foo ID 的列表。我需要为每个 ID 调用一个存储过程。例如 Guid[] siteIds = ...; // typically contains 100 to 300 elemen

首页

博学

6Ren·AI

商城

c++ - #pragma omp parallel 和 #pragma omp parallel for 之间的区别