gpt4 book ai didi

c - 进一步加速 openmp 缩减循环

转载 作者:行者123 更新时间:2023-11-30 16:57:48 24 4
gpt4 key购买 nike

是否有更好的方法来执行归约操作或进一步提高此代码的性能。我应该使用折叠子句吗?MAX只是求两者之间的最大值,EPS只是一个 float ,fdm是一个结构体

 void cal_beta(float *beta, float **gd0, float **gd1, float **cg, fdm2d fdm)
/*< calculate beta for nonlinear conjugate gradient algorithm >*/
{
int ix,iz;
float a, b, c ;
a=0.0,b=0.0,c=0.0 ;
#ifdef _OPENMP
#pragma omp parallel for private(ix,iz) \
schedule(static,fdm->ompchunk) shared(gd0,gd1,cg) \
reduction(+:a,b,c)
#endif
for(ix=0;ix<fdm->nxpad;ix++){
for(iz=0;iz<fdm->nzpad;iz++){
a+=gd1[ix][iz]*(gd1[ix][iz]-gd0[ix][iz]);
b+=cg[ix][iz]*(gd1[ix][iz]-gd0[ix][iz]);
c+=gd1[ix][iz]*gd1[ix][iz];
}
}
float beta_HS=0.0;
float beta_DY=0.0;
if(fabs(b)>EPS)
{
beta_HS=a/b;
beta_DY=c/b;

}
*beta=MAX(0.0, MIN(beta_HS, beta_DY));/* Hybrid HS-DY method combined with iteration restart */

}

最佳答案

您可以考虑使用collapse子句将两个循环合并到一个并行循环区域中。根据OpenMP 4.5 manual第 58 页,第 25-29 行:

The collapse clause may be used to specify how many loops are associated with the loop 26 construct. The parameter of the collapse clause must be a constant positive integer expression. 27 If a collapse clause is specified with a parameter value greater than 1, then the iterations of the 28 associated loops to which the clause applies are collapsed into one larger iteration space that is then 29 divided according to the schedule clause

关于c - 进一步加速 openmp 缩减循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39333696/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com