gpt4 book ai didi

c++ - 是否有一些有意义的统计数据来证明保持有符号整数算术溢出未定义?

转载 作者:IT老高 更新时间:2023-10-28 12:34:49 24 4
gpt4 key购买 nike

C 标准明确指定有符号整数溢出具有未定义行为。然而,大多数 CPU 使用已定义的溢出语义实现有符号算术(除法溢出可能除外:x/0INT_MIN/-1)。

编译器编写者一直在利用这种溢出的未定义性来添加更积极的优化,这些优化往往会以非常微妙的方式破坏遗留代码。例如,此代码可能在较旧的编译器上工作,但在当前版本的 gccclang 上不再适用:

/* Increment a by a value in 0..255, clamp a to positive integers.
The code relies on 32-bit wrap-around, but the C Standard makes
signed integer overflow undefined behavior, so sum_max can now
return values less than a. There are Standard compliant ways to
implement this, but legacy code is what it is... */
int sum_max(int a, unsigned char b) {
int res = a + b;
return (res >= a) ? res : INT_MAX;
}

是否有确凿的证据表明这些优化是值得的?是否有比较研究记录了实际示例甚至经典基准的实际改进?

我在看这个的时候想出了这个问题:C++Now 2018: John Regehr “Closing Keynote: Undefined Behavior and Compiler Optimizations”

我正在标记 cc++,因为这两种语言的问题相似,但答案可能不同。

最佳答案

我不了解研究和统计数据,但是是的,考虑到编译器实际所做的这一点,肯定有优化。是的,它们非常重要(例如 tldr 循环矢量化)。

除了编译器优化之外,还有另一个方面需要考虑。使用 UB,您可以获得 C/C++ 有符号整数的算术行为,就像您在数学上所期望的那样。例如 x + 10 > x现在成立(当然对于有效的代码),但不会在环绕行为。

我发现了一篇很棒的文章 How undefined signed overflow enables optimizations in GCC来自 Krister Walfridsson 的博客列出了一些将签名溢出 UB 考虑在内的优化。下面的例子来自它。我正在向它们添加 c++ 和汇编示例。

如果优化看起来过于简单、无趣或没有影响,请记住,这些优化只是更大的优化链中的步骤。蝴蝶效应确实会发生,因为在较早的步骤中看似不重要的优化可能会在以后的步骤中触发更具影响力的优化。

如果这些示例看起来很荒谬(谁会写 x * 10 > 0),请记住,您可以通过常量、宏、模板非常轻松地在 C 和 C++ 中找到此类示例。此外,编译器在其 IR 中应用转换和优化时可以得到此类示例。

有符号整数表达式简化

  • 与0比较消除乘法

    (x * c) cmp 0   ->   x cmp 0 
    bool foo(int x) { return x * 10 > 0 }
    foo(int):
    test edi, edi
    setg al
    ret
  • 乘法后除法

    (x * c1) / c2 -> x * (c1 / c2) if c1 is divisible by c2

    int foo(int x) { return (x * 20) / 10; }
    foo(int):
    lea eax, [rdi+rdi]
    ret
  • 消除否定

    (-x) / (-y) -> x / y

    int foo(int x, int y) { return (-x) / (-y); }
    foo(int, int):
    mov eax, edi
    cdq
    idiv esi
    ret
  • 简化总是对或错的比较

    x + c < x       ->   false
    x + c <= x -> false
    x + c > x -> true
    x + c >= x -> true
    bool foo(int x) { return x + 10 >= x; }
    foo(int):
    mov eax, 1
    ret
  • 消除比较中的否定

    (-x) cmp (-y)   ->   y cmp x
    bool foo(int x, int y) { return -x < -y; }
    foo(int, int):
    cmp edi, esi
    setg al
    ret
  • 减少常量的大小

    x + c > y       ->   x + (c - 1) >= y
    x + c <= y -> x + (c - 1) < y
    bool foo(int x, int y) { return x + 10 <= y; }
    foo(int, int):
    add edi, 9
    cmp edi, esi
    setl al
    ret
  • 消除比较中的常量

    (x + c1) cmp c2         ->   x cmp (c2 - c1)
    (x + c1) cmp (y + c2) -> x cmp (y + (c2 - c1)) if c1 <= c2

    The second transformation is only valid if c1 <= c2, as it wouldotherwise introduce an overflow when y has the value INT_MIN.

    bool foo(int x) { return x + 42 <= 11; }
    foo(int):
    cmp edi, -30
    setl al
    ret

指针算术和类型提升

If an operation does not overflow, then we will get the same result ifwe do the operation in a wider type. This is often useful when doingthings like array indexing on 64-bit architectures — the indexcalculations are typically done using 32-bit int, but the pointers are64-bit, and the compiler may generate more efficient code when signedoverflow is undefined by promoting the 32-bit integers to 64-bitoperations instead of generating type extensions.

One other aspect of this is that undefined overflow ensures that a[i]and a[i+1] are adjacent. This improves analysis of memory accesses forvectorization etc.

这是一个非常重要的优化,因为循环向量化是最有效的优化算法之一。

这是一个将索引从无符号索引更改为有符号索引改进生成程序集的示例:

未签名版本

#include <cstddef>

auto foo(int* v, std::size_t start)
{
int sum = 0;

for (std::size_t i = start; i < start + 4; ++i)
sum += v[i];

return sum;
}

在未签名的情况下start + 4必须考虑回绕,并生成一个分支来处理这种情况(分支不利于性能):

; gcc on x64 with -march=skylake

foo1(int*, unsigned long):
cmp rsi, -5
ja .L3
vmovdqu xmm0, XMMWORD PTR [rdi+rsi*4]
vpsrldq xmm1, xmm0, 8
vpaddd xmm0, xmm0, xmm1
vpsrldq xmm1, xmm0, 4
vpaddd xmm0, xmm0, xmm1
vmovd eax, xmm0
ret
.L3:
xor eax, eax
ret
; clang on x64 with -march=skylake

foo1(int*, unsigned long): # @foo1(int*, unsigned long)
xor eax, eax
cmp rsi, -4
jae .LBB0_2
vpbroadcastq xmm0, qword ptr [rdi + 4*rsi + 8]
vpaddd xmm0, xmm0, xmmword ptr [rdi + 4*rsi]
vpshufd xmm1, xmm0, 85 # xmm1 = xmm0[1,1,1,1]
vpaddd xmm0, xmm0, xmm1
vmovd eax, xmm0
.LBB0_2:
ret

附带说明,使用更窄的类型会导致更糟糕的汇编,从而禁止使用 SSE 矢量化指令:

#include <cstddef>

auto foo(int* v, unsigned start)
{
int sum = 0;

for (unsigned i = start; i < start + 4; ++i)
sum += v[i];

return sum;
}
; gcc on x64 with -march=skylake

foo(int*, unsigned int):
cmp esi, -5
ja .L3
mov eax, esi
mov eax, DWORD PTR [rdi+rax*4]
lea edx, [rsi+1]
add eax, DWORD PTR [rdi+rdx*4]
lea edx, [rsi+2]
add eax, DWORD PTR [rdi+rdx*4]
lea edx, [rsi+3]
add eax, DWORD PTR [rdi+rdx*4]
ret
.L3:
xor eax, eax
ret
; clang on x64 with -march=skylake

foo(int*, unsigned int): # @foo(int*, unsigned int)
xor eax, eax
cmp esi, -5
ja .LBB0_3
mov ecx, esi
add esi, 4
mov eax, dword ptr [rdi + 4*rcx]
lea rdx, [rcx + 1]
cmp rdx, rsi
jae .LBB0_3
add eax, dword ptr [rdi + 4*rcx + 4]
add eax, dword ptr [rdi + 4*rcx + 8]
add eax, dword ptr [rdi + 4*rcx + 12]
.LBB0_3:
ret

签名版

然而,使用有符号索引会产生很好的矢量化无分支代码:

#include <cstddef>

auto foo(int* v, std::ptrdiff_t start)
{
int sum = 0;

for (std::ptrdiff_t i = start; i < start + 4; ++i)
sum += v[i];

return sum;
}
; gcc on x64 with -march=skylake

foo(int*, long):
vmovdqu xmm0, XMMWORD PTR [rdi+rsi*4]
vpsrldq xmm1, xmm0, 8
vpaddd xmm0, xmm0, xmm1
vpsrldq xmm1, xmm0, 4
vpaddd xmm0, xmm0, xmm1
vmovd eax, xmm0
ret
; clang on x64 with -march=skylake

foo(int*, long): # @foo(int*, long)
vpbroadcastq xmm0, qword ptr [rdi + 4*rsi + 8]
vpaddd xmm0, xmm0, xmmword ptr [rdi + 4*rsi]
vpshufd xmm1, xmm0, 85 # xmm1 = xmm0[1,1,1,1]
vpaddd xmm0, xmm0, xmm1
vmovd eax, xmm0
ret

在使用较窄的有符号类型时仍然使用向量化指令:

#include <cstddef>

auto foo(int* v, int start)
{
int sum = 0;

for (int i = start; i < start + 4; ++i)
sum += v[i];

return sum;
}
; gcc on x64 with -march=skylake

foo(int*, int):
movsx rsi, esi
vmovdqu xmm0, XMMWORD PTR [rdi+rsi*4]
vpsrldq xmm1, xmm0, 8
vpaddd xmm0, xmm0, xmm1
vpsrldq xmm1, xmm0, 4
vpaddd xmm0, xmm0, xmm1
vmovd eax, xmm0
ret
; clang on x64 with -march=skylake

foo(int*, int): # @foo(int*, int)
movsxd rax, esi
vpbroadcastq xmm0, qword ptr [rdi + 4*rax + 8]
vpaddd xmm0, xmm0, xmmword ptr [rdi + 4*rax]
vpshufd xmm1, xmm0, 85 # xmm1 = xmm0[1,1,1,1]
vpaddd xmm0, xmm0, xmm1
vmovd eax, xmm0
ret

数值范围计算

The compiler keeps track of the variables' range of possible values ateach point in the program, i.e. for code such as

int x = foo();
if (x > 0) {
int y = x + 5;
int z = y / 4;

it determines that x has the range [1, INT_MAX] after theif-statement, and can thus determine that y has the range [6, INT_MAX] as overflow is not allowed. And the next line can beoptimized to int z = y >> 2; as the compiler knows that y isnon-negative.

auto foo(int x)
{
if (x <= 0)
__builtin_unreachable();

return (x + 5) / 4;
}
foo(int):
lea eax, [rdi+5]
sar eax, 2
ret

The undefined overflow helps optimizations that need to compare twovalues (as the wrapping case would give possible values of the form[INT_MIN, (INT_MIN+4)] or [6, INT_MAX] that prevents all usefulcomparisons with < or >), such as

  • Changing comparisons x<y to true or false if the ranges for x and y does not overlap
  • Changing min(x,y) or max(x,y) to x or y if the ranges do not overlap
  • Changing abs(x) to x or -x if the range does not cross 0
  • Changing x/c to x>>log2(c) if x>0 and the constant c is a power of 2
  • Changing x%c to x&(c-1) if x>0 and the constant c is a power of 2

循环分析与优化

The canonical example of why undefined signed overflow helps loopoptimizations is that loops like

for (int i = 0; i <= m; i++)

are guaranteed to terminate for undefined overflow. This helpsarchitectures that have specific loop instructions, as they do ingeneral not handle infinite loops.

But undefined signed overflow helps many more loop optimizations. Allanalysis such as determining number of iteration, transforminginduction variables, and keeping track of memory accesses are usingeverything in the previous sections in order to do its work. Inparticular, the set of loops that can be vectorized are severelyreduced when signed overflow is allowed.

关于c++ - 是否有一些有意义的统计数据来证明保持有符号整数算术溢出未定义?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56047702/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com