gpt4 book ai didi

c - openMP 的性能改进

转载 作者:行者123 更新时间:2023-11-30 19:42:09 25 4
gpt4 key购买 nike

下面是求某个给定随机数的阶乘的程序。与串行相比,并行的性能即使对于大输入也要好得多。使用openmp提高性能的恰当逻辑应该是什么,以及如何进一步优化openmp并行化代码。

代码-

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#include <time.h>

int main( )
{
int i,j,k,num,thread;
int *arr,*result,temp;
time_t t;
srand((unsigned)time(&t));
scanf("%d",&num);
arr = (int*)malloc(sizeof(int)*num);
result = (int*)malloc(sizeof(int)*num);

for(i=0;i<num;i++){
arr[i]=rand()%10;
}

for(i=0;i<num;i++){
result[i]=1;
}

clock_t begin, end;
double time_spent_omp;
double time_spent;

begin = clock();
/* here, do your time-consuming job */

#pragma omp parallel for private(temp)
for(j=0;j<num;j++){
temp = arr[j];
for(i=0;i<temp;temp--)
result[j]=result[j]*temp;
}


end = clock();
time_spent_omp = (double)(end - begin) / CLOCKS_PER_SEC;

/*
for(i=0;i<num;i++){
printf("%d\t%d\n",arr[i],result[i]);
}*/

for(i=0;i<num;i++){
result[i]=1;
}

begin = clock();

for(j=0;j<num;j++){
temp = arr[j];
for(i=0;i<temp;temp--)
result[j]=result[j]*temp;
}

end = clock();
time_spent = (double)(end - begin)/ CLOCKS_PER_SEC;

/*
for(i=0;i<num;i++){
printf("%d\t%d\n",arr[i],result[i]);
}*/

printf("Time for serial is %f\nTime for openMP is %f\n",time_spent, time_spent_omp);

return 0;
}

输出 -

rnt@rnt-laptop:~/Desktop/C$ gcc -fopenmp -o fact fact.c
rnt@rnt-laptop:~/Desktop/C$ ./fact
5
Time for serial is 0.000004
Time for openMP is 0.006214
rnt@rnt-laptop:~/Desktop/C$ ./fact
11
Time for serial is 0.000013
Time for openMP is 0.000391
rnt@rnt-laptop:~/Desktop/C$ ./fact
111
Time for serial is 0.000078
Time for openMP is 0.000507
rnt@rnt-laptop:~/Desktop/C$ ./fact
1111
Time for serial is 0.000454
Time for openMP is 0.000860
rnt@rnt-laptop:~/Desktop/C$ ./fact
11111
Time for serial is 0.002947
Time for openMP is 0.004829
rnt@rnt-laptop:~/Desktop/C$ ./fact
111111
Time for serial is 0.022903
Time for openMP is 0.044273
rnt@rnt-laptop:~/Desktop/C$ ./fact
1111111
Time for serial is 0.030446
Time for openMP is 0.160402
rnt@rnt-laptop:~/Desktop/C$ ./fact
11111111
Time for serial is 0.298610
Time for openMP is 1.580710
rnt@rnt-laptop:~/Desktop/C$ ./fact
111111111
Time for serial is 2.993646
Time for openMP is 13.202524

最佳答案

尝试使用:

#pragma omp parallel for private(temp) schedule(static,XX) 

有多个 XX 值,例如 10、50、100、1000 等...

默认情况下,OpenMP 使用动态调度,这对于在迭代/核心之间存在平衡问题的并行代码来说更好。

编辑:您需要将循环迭代器作为私有(private)变量。否则会重复工作。您可以尝试不同的调度程序并改变参数来调整线程性能......

关于c - openMP 的性能改进,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32747096/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com