gpt4 book ai didi

CUDA C 最佳实践 : unsigned vs signed optimization

转载 作者:太空狗 更新时间:2023-10-29 17:00:41 24 4
gpt4 key购买 nike

CUDA C Best Practices Guide有一小部分是关于使用有符号和无符号整数的。

In the C language standard, unsigned integer overflow semantics are well defined, whereas signed integer overflow causes undefined results. Therefore, the compiler can optimize more aggressively with signed arithmetic than it can with unsigned arithmetic. This is of particular note with loop counters: since it is common for loop counters to have values that are always positive, it may be tempting to declare the counters as unsigned. For slightly better performance, however, they should instead be declared as signed.

For example, consider the following code:

    for (i = 0; i < n; i++) {           out[i] = in[offset + stride*i];      }

Here, the sub-expression stride*i could overflow a 32-bit integer, so if i is declared as unsigned, the overflow semantics prevent the compiler from using some optimizations that might otherwise have applied, such as strength reduction. If instead i is declared as signed, where the overflow semantics are undefined, the compiler has more leeway to use these optimizations.

前两句话尤其让我感到困惑。如果无符号值的语义定义明确并且有符号值可以产生未定义的结果,编译器如何为后者产生更好的代码?

最佳答案

文本显示了这个例子:

for (i = 0; i < n; i++) {  
out[i] = in[offset + stride*i];
}

其中还提到了“强度降低”。允许编译器将其替换为以下“伪优化 C”代码:

tmp = offset;
for (i = 0; i < n; i++) {
out[i] = in[tmp];
tmp += stride;
}

现在,想象一个只支持 float (和整数作为子集)的处理器。 tmp 将是“非常大的数字”类型。

现在,C 标准规定涉及无符号操作数的计算永远不会溢出,而是以最大值 + 1 为模减少。这意味着在无符号 i 的情况下,编译器必须做这个:

tmp = offset;
for (i = 0; i < n; i++) {
out[i] = in[tmp];
tmp += stride;
if (tmp > UINT_MAX)
{
tmp -= UINT_MAX + 1;
}
}

但是对于有符号整数,编译器可以为所欲为。它不需要检查是否溢出——如果确实溢出,那么就是开发人员的问题(它可能导致异常,或产生错误的值)。所以代码可以更快。

关于CUDA C 最佳实践 : unsigned vs signed optimization,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14105958/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com