gpt4 book ai didi

c - switch 语句是否比 for 循环更快?

转载 作者:太空宇宙 更新时间:2023-11-04 00:24:20 24 4
gpt4 key购买 nike

我在看稀疏束调整库的源代码(sba) Lourakis & Argyros 着。更准确地说,我正在查看以下函数 nrmL2xmy,它计算两个 vector 的平方 L2 差。以下代码是从 sba_levmar.c 文件中复制的,从 146 行开始:

/* Compute e=x-y for two n-vectors x and y and return the squared L2 norm of e.
* e can coincide with either x or y.
* Uses loop unrolling and blocking to reduce bookkeeping overhead & pipeline
* stalls and increase instruction-level parallelism; see http://www.abarnett.demon.co.uk/tutorial.html
*/
static double nrmL2xmy(double *const e, const double *const x, const double *const y, const int n)
{
const int blocksize=8, bpwr=3; /* 8=2^3 */
register int i;
int j1, j2, j3, j4, j5, j6, j7;
int blockn;
register double sum0=0.0, sum1=0.0, sum2=0.0, sum3=0.0;

/* n may not be divisible by blocksize,
* go as near as we can first, then tidy up.
*/
blockn = (n>>bpwr)<<bpwr; /* (n / blocksize) * blocksize; */

/* unroll the loop in blocks of `blocksize'; looping downwards gains some more speed */
for(i=blockn-1; i>0; i-=blocksize){
e[i ]=x[i ]-y[i ]; sum0+=e[i ]*e[i ];
j1=i-1; e[j1]=x[j1]-y[j1]; sum1+=e[j1]*e[j1];
j2=i-2; e[j2]=x[j2]-y[j2]; sum2+=e[j2]*e[j2];
j3=i-3; e[j3]=x[j3]-y[j3]; sum3+=e[j3]*e[j3];
j4=i-4; e[j4]=x[j4]-y[j4]; sum0+=e[j4]*e[j4];
j5=i-5; e[j5]=x[j5]-y[j5]; sum1+=e[j5]*e[j5];
j6=i-6; e[j6]=x[j6]-y[j6]; sum2+=e[j6]*e[j6];
j7=i-7; e[j7]=x[j7]-y[j7]; sum3+=e[j7]*e[j7];
}

/*
* There may be some left to do.
* This could be done as a simple for() loop,
* but a switch is faster (and more interesting)
*/

i=blockn;
if(i<n){
/* Jump into the case at the place that will allow
* us to finish off the appropriate number of items.
*/
switch(n - i){
case 7 : e[i]=x[i]-y[i]; sum0+=e[i]*e[i]; ++i;
case 6 : e[i]=x[i]-y[i]; sum0+=e[i]*e[i]; ++i;
case 5 : e[i]=x[i]-y[i]; sum0+=e[i]*e[i]; ++i;
case 4 : e[i]=x[i]-y[i]; sum0+=e[i]*e[i]; ++i;
case 3 : e[i]=x[i]-y[i]; sum0+=e[i]*e[i]; ++i;
case 2 : e[i]=x[i]-y[i]; sum0+=e[i]*e[i]; ++i;
case 1 : e[i]=x[i]-y[i]; sum0+=e[i]*e[i]; ++i;
}
}

return sum0+sum1+sum2+sum3;
}

在代码中间(大致),作者声明如下:

 /*
* There may be some left to do.
* This could be done as a simple for() loop,
* but a switch is faster (and more interesting)
*/

我不明白为什么 switch 比简单的 for 循环更快。

所以我的问题是:这个说法是真的吗?如果是,为什么

最佳答案

在这种情况下,switch 会更快,因为循环会多次检查结束条件,而 switch 只会检查一次。这叫做 loop unrolling ,并且优化编译器在很大程度上自行完成。

关于c - switch 语句是否比 for 循环更快?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31878634/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com