gpt4 book ai didi

multithreading - OpenMP:基于 NUMA 的拆分循环

转载 作者:行者123 更新时间:2023-12-04 04:33:21 26 4
gpt4 key购买 nike

我正在使用 8 个 OpenMP 线程运行以下循环:

float* data;
int n;

#pragma omp parallel for schedule(dynamic, 1) default(none) shared(data, n)
for ( int i = 0; i < n; ++i )
{
DO SOMETHING WITH data[i]
}

由于 NUMA,我想使用线程 0、1、2、3 运行循环的前半部分(i = 0,...,n/2-1)
和下半部分(i = n/2,...,n-1),线程为 4、5、6、7。

本质上,我想并行运行两个循环,每个循环使用一组单独的 OpenMP 线程。

如何使用 OpenMP 实现这一目标?

谢谢

PS:理想情况下,如果一组线程完成了循环的一半,而另一半循环仍未完成,我希望已完成组的线程加入未完成组处理循环的另一半。

我正在考虑类似下面的事情,但我想知道我是否可以使用 OpenMP 来做到这一点而无需额外的簿记:
int n;
int i0 = 0;
int i1 = n / 2;

#pragma omp parallel for schedule(dynamic, 1) default(none) shared(data,n,i0,i1)
for ( int i = 0; i < n; ++i )
{
int nt = omp_get_thread_num();
int j;
#pragma omp critical
{
if ( nt < 4 ) {
if ( i0 < n / 2 ) j = i0++; // First 4 threads process first half
else j = i1++; // of loop unless first half is finished
}
else {
if ( i1 < n ) j = i1++; // Second 4 threads process second half
else j = i0++; // of loop unless second half is finished
}
}

DO SOMETHING WITH data[j]
}

最佳答案

可能最好的是使用嵌套并行化,首先在 NUMA 节点上,然后在每个节点内;那么您可以使用 dynamic 的基础架构同时仍然在线程组之间分解数据:

#include <omp.h>
#include <stdio.h>

int main(int argc, char **argv) {

const int ngroups=2;
const int npergroup=4;
const int ndata = 16;

omp_set_nested(1);
#pragma omp parallel for num_threads(ngroups)
for (int i=0; i<ngroups; i++) {
int start = (ndata*i+(ngroups-1))/ngroups;
int end = (ndata*(i+1)+(ngroups-1))/ngroups;

#pragma omp parallel for num_threads(npergroup) shared(i, start, end) schedule(dynamic,1)
for (int j=start; j<end; j++) {
printf("Thread %d from group %d working on data %d\n", omp_get_thread_num(), i, j);
}
}

return 0;
}

运行这个给出
$ gcc -fopenmp -o nested nested.c -Wall -O -std=c99
$ ./nested | sort -n -k 9
Thread 0 from group 0 working on data 0
Thread 3 from group 0 working on data 1
Thread 1 from group 0 working on data 2
Thread 2 from group 0 working on data 3
Thread 1 from group 0 working on data 4
Thread 3 from group 0 working on data 5
Thread 3 from group 0 working on data 6
Thread 0 from group 0 working on data 7
Thread 0 from group 1 working on data 8
Thread 3 from group 1 working on data 9
Thread 2 from group 1 working on data 10
Thread 1 from group 1 working on data 11
Thread 0 from group 1 working on data 12
Thread 0 from group 1 working on data 13
Thread 2 from group 1 working on data 14
Thread 0 from group 1 working on data 15

但请注意,嵌套方法很可能会改变线程分配,而不是单级线程,因此您可能必须更多地使用 KMP_AFFINITY 或其他机制才能再次正确绑定(bind)。

关于multithreading - OpenMP:基于 NUMA 的拆分循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24957781/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com