c++ - 是否有一些有意义的统计数据来证明保持有符号整数算术溢出未定义？

转载作者：IT老高更新时间：2023-10-28 12:34:49

C 标准明确指定有符号整数溢出具有未定义行为。然而，大多数 CPU 使用已定义的溢出语义实现有符号算术(除法溢出可能除外:x/0 和 INT_MIN/-1)。

编译器编写者一直在利用这种溢出的未定义性来添加更积极的优化，这些优化往往会以非常微妙的方式破坏遗留代码。例如，此代码可能在较旧的编译器上工作，但在当前版本的 gcc 和 clang 上不再适用:

/* Increment a by a value in 0..255, clamp a to positive integers.
   The code relies on 32-bit wrap-around, but the C Standard makes
   signed integer overflow undefined behavior, so sum_max can now 
   return values less than a. There are Standard compliant ways to
   implement this, but legacy code is what it is... */
int sum_max(int a, unsigned char b) {
    int res = a + b;
    return (res >= a) ? res : INT_MAX;
}

是否有确凿的证据表明这些优化是值得的？是否有比较研究记录了实际示例甚至经典基准的实际改进？

我在看这个的时候想出了这个问题:C++Now 2018: John Regehr “Closing Keynote: Undefined Behavior and Compiler Optimizations”

我正在标记 c 和 c++，因为这两种语言的问题相似，但答案可能不同。

最佳答案

我不了解研究和统计数据，但是是的，考虑到编译器实际所做的这一点，肯定有优化。是的，它们非常重要(例如 tldr 循环矢量化)。

除了编译器优化之外，还有另一个方面需要考虑。使用 UB，您可以获得 C/C++ 有符号整数的算术行为，就像您在数学上所期望的那样。例如 x + 10 > x现在成立(当然对于有效的代码)，但不会在环绕行为。

我发现了一篇很棒的文章 How undefined signed overflow enables optimizations in GCC来自 Krister Walfridsson 的博客列出了一些将签名溢出 UB 考虑在内的优化。下面的例子来自它。我正在向它们添加 c++ 和汇编示例。

如果优化看起来过于简单、无趣或没有影响，请记住，这些优化只是更大的优化链中的步骤。蝴蝶效应确实会发生，因为在较早的步骤中看似不重要的优化可能会在以后的步骤中触发更具影响力的优化。

如果这些示例看起来很荒谬(谁会写 x * 10 > 0)，请记住，您可以通过常量、宏、模板非常轻松地在 C 和 C++ 中找到此类示例。此外，编译器在其 IR 中应用转换和优化时可以得到此类示例。

有符号整数表达式简化

与0比较消除乘法

(x * c) cmp 0   ->   x cmp 0

bool foo(int x) { return x * 10 > 0 }

foo(int):
        test    edi, edi
        setg    al
        ret

乘法后除法

(x * c1) / c2 -> x * (c1 / c2) if c1 is divisible by c2

int foo(int x) { return (x * 20) / 10; }

foo(int):
        lea     eax, [rdi+rdi]
        ret

消除否定

(-x) / (-y) -> x / y

int foo(int x, int y) { return (-x) / (-y); }

foo(int, int):
        mov     eax, edi
        cdq
        idiv    esi
        ret

简化总是对或错的比较

x + c < x       ->   false
x + c <= x      ->   false
x + c > x       ->   true
x + c >= x      ->   true

bool foo(int x) { return x + 10 >= x; }

foo(int):
        mov     eax, 1
        ret

消除比较中的否定

(-x) cmp (-y)   ->   y cmp x

bool foo(int x, int y) { return -x < -y; }

foo(int, int):
        cmp     edi, esi
        setg    al
        ret

减少常量的大小

x + c > y       ->   x + (c - 1) >= y
x + c <= y      ->   x + (c - 1) < y

bool foo(int x, int y) { return x + 10 <= y; }

foo(int, int):
        add     edi, 9
        cmp     edi, esi
        setl    al
        ret

消除比较中的常量

(x + c1) cmp c2         ->   x cmp (c2 - c1)
(x + c1) cmp (y + c2)   ->   x cmp (y + (c2 - c1)) if c1 <= c2

The second transformation is only valid if c1 <= c2, as it wouldotherwise introduce an overflow when y has the value INT_MIN.

bool foo(int x) { return x + 42 <= 11; }

foo(int):
        cmp     edi, -30
        setl    al
        ret

指针算术和类型提升

If an operation does not overflow, then we will get the same result ifwe do the operation in a wider type. This is often useful when doingthings like array indexing on 64-bit architectures — the indexcalculations are typically done using 32-bit int, but the pointers are64-bit, and the compiler may generate more efficient code when signedoverflow is undefined by promoting the 32-bit integers to 64-bitoperations instead of generating type extensions.

One other aspect of this is that undefined overflow ensures that a[i]and a[i+1] are adjacent. This improves analysis of memory accesses forvectorization etc.

这是一个非常重要的优化，因为循环向量化是最有效的优化算法之一。

这是一个将索引从无符号索引更改为有符号索引改进生成程序集的示例:

未签名版本

#include <cstddef>

auto foo(int* v, std::size_t start)
{
    int sum = 0;

    for (std::size_t i = start; i < start + 4; ++i)
        sum += v[i];

    return sum;
}

在未签名的情况下start + 4必须考虑回绕，并生成一个分支来处理这种情况(分支不利于性能):

; gcc on x64 with -march=skylake

foo1(int*, unsigned long):
        cmp     rsi, -5
        ja      .L3
        vmovdqu xmm0, XMMWORD PTR [rdi+rsi*4]
        vpsrldq xmm1, xmm0, 8
        vpaddd  xmm0, xmm0, xmm1
        vpsrldq xmm1, xmm0, 4
        vpaddd  xmm0, xmm0, xmm1
        vmovd   eax, xmm0
        ret
.L3:
        xor     eax, eax
        ret

; clang on x64 with -march=skylake

foo1(int*, unsigned long):                             # @foo1(int*, unsigned long)
        xor     eax, eax
        cmp     rsi, -4
        jae     .LBB0_2
        vpbroadcastq    xmm0, qword ptr [rdi + 4*rsi + 8]
        vpaddd  xmm0, xmm0, xmmword ptr [rdi + 4*rsi]
        vpshufd xmm1, xmm0, 85                  # xmm1 = xmm0[1,1,1,1]
        vpaddd  xmm0, xmm0, xmm1
        vmovd   eax, xmm0
.LBB0_2:
        ret

附带说明，使用更窄的类型会导致更糟糕的汇编，从而禁止使用 SSE 矢量化指令:

#include <cstddef>

auto foo(int* v, unsigned start)
{
    int sum = 0;

    for (unsigned i = start; i < start + 4; ++i)
        sum += v[i];

    return sum;
}

; gcc on x64 with -march=skylake

foo(int*, unsigned int):
        cmp     esi, -5
        ja      .L3
        mov     eax, esi
        mov     eax, DWORD PTR [rdi+rax*4]
        lea     edx, [rsi+1]
        add     eax, DWORD PTR [rdi+rdx*4]
        lea     edx, [rsi+2]
        add     eax, DWORD PTR [rdi+rdx*4]
        lea     edx, [rsi+3]
        add     eax, DWORD PTR [rdi+rdx*4]
        ret
.L3:
        xor     eax, eax
        ret

; clang on x64 with -march=skylake

foo(int*, unsigned int):                              # @foo(int*, unsigned int)
        xor     eax, eax
        cmp     esi, -5
        ja      .LBB0_3
        mov     ecx, esi
        add     esi, 4
        mov     eax, dword ptr [rdi + 4*rcx]
        lea     rdx, [rcx + 1]
        cmp     rdx, rsi
        jae     .LBB0_3
        add     eax, dword ptr [rdi + 4*rcx + 4]
        add     eax, dword ptr [rdi + 4*rcx + 8]
        add     eax, dword ptr [rdi + 4*rcx + 12]
.LBB0_3:
        ret

签名版

然而，使用有符号索引会产生很好的矢量化无分支代码:

#include <cstddef>

auto foo(int* v, std::ptrdiff_t start)
{
    int sum = 0;

    for (std::ptrdiff_t i = start; i < start + 4; ++i)
        sum += v[i];

    return sum;
}

; gcc on x64 with -march=skylake

foo(int*, long):
        vmovdqu xmm0, XMMWORD PTR [rdi+rsi*4]
        vpsrldq xmm1, xmm0, 8
        vpaddd  xmm0, xmm0, xmm1
        vpsrldq xmm1, xmm0, 4
        vpaddd  xmm0, xmm0, xmm1
        vmovd   eax, xmm0
        ret

; clang on x64 with -march=skylake

foo(int*, long):                              # @foo(int*, long)
        vpbroadcastq    xmm0, qword ptr [rdi + 4*rsi + 8]
        vpaddd  xmm0, xmm0, xmmword ptr [rdi + 4*rsi]
        vpshufd xmm1, xmm0, 85                  # xmm1 = xmm0[1,1,1,1]
        vpaddd  xmm0, xmm0, xmm1
        vmovd   eax, xmm0
        ret

在使用较窄的有符号类型时仍然使用向量化指令:

#include <cstddef>

auto foo(int* v, int start)
{
    int sum = 0;

    for (int i = start; i < start + 4; ++i)
        sum += v[i];

    return sum;
}

; gcc on x64 with -march=skylake

foo(int*, int):
        movsx   rsi, esi
        vmovdqu xmm0, XMMWORD PTR [rdi+rsi*4]
        vpsrldq xmm1, xmm0, 8
        vpaddd  xmm0, xmm0, xmm1
        vpsrldq xmm1, xmm0, 4
        vpaddd  xmm0, xmm0, xmm1
        vmovd   eax, xmm0
        ret

; clang on x64 with -march=skylake

foo(int*, int):                              # @foo(int*, int)
        movsxd  rax, esi
        vpbroadcastq    xmm0, qword ptr [rdi + 4*rax + 8]
        vpaddd  xmm0, xmm0, xmmword ptr [rdi + 4*rax]
        vpshufd xmm1, xmm0, 85                  # xmm1 = xmm0[1,1,1,1]
        vpaddd  xmm0, xmm0, xmm1
        vmovd   eax, xmm0
        ret

数值范围计算

The compiler keeps track of the variables' range of possible values ateach point in the program, i.e. for code such as
int x = foo();
if (x > 0) {
  int y = x + 5;
  int z = y / 4;
it determines that x has the range [1, INT_MAX] after theif-statement, and can thus determine that y has the range [6, INT_MAX] as overflow is not allowed. And the next line can beoptimized to int z = y >> 2; as the compiler knows that y isnon-negative.

auto foo(int x)
{
    if (x <= 0)
        __builtin_unreachable();
    
    return (x + 5) / 4;
}

foo(int):
        lea     eax, [rdi+5]
        sar     eax, 2
        ret

The undefined overflow helps optimizations that need to compare twovalues (as the wrapping case would give possible values of the form[INT_MIN, (INT_MIN+4)] or [6, INT_MAX] that prevents all usefulcomparisons with < or >), such as

Changing comparisons x<y to true or false if the ranges for x and y does not overlap

Changing min(x,y) or max(x,y) to x or y if the ranges do not overlap

Changing abs(x) to x or -x if the range does not cross 0

Changing x/c to x>>log2(c) if x>0 and the constant c is a power of 2

Changing x%c to x&(c-1) if x>0 and the constant c is a power of 2

循环分析与优化

The canonical example of why undefined signed overflow helps loopoptimizations is that loops like
for (int i = 0; i <= m; i++)
are guaranteed to terminate for undefined overflow. This helpsarchitectures that have specific loop instructions, as they do ingeneral not handle infinite loops.

But undefined signed overflow helps many more loop optimizations. Allanalysis such as determining number of iteration, transforminginduction variables, and keeping track of memory accesses are usingeverything in the previous sections in order to do its work. Inparticular, the set of loops that can be vectorized are severelyreduced when signed overflow is allowed.

关于c++ - 是否有一些有意义的统计数据来证明保持有符号整数算术溢出未定义？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56047702/

文章推荐： macos - 通过命令行在 Mac 上安装 Docker Toolbox

文章推荐： linux - 用于批量重命名文件夹中文件的 Shell/Bash 快捷方式

文章推荐： android-studio - Flutter Android Studio 构建失败

文章推荐： python - 在 Python 中生成一个随机字母

统计、机器学习和数据挖掘
我目前正在学习数据挖掘，有以下问题。机器学习和数据挖掘之间有什么关系？我发现许多数据挖掘技术都与统计相关，而我“听说”数据挖掘与机器学习有很多关系。所以我的问题是:机器学习与统计学密切相关吗？如
R 统计，将排名表更改为配对
我有很多表的数据，例如: event_id player finish 1 a 1 1 b 2 1 c
Nginx http_status_module 统计
我对 http_status_module 提供的统计数据感兴趣特别是上游部分的统计数据。 http://nginx.org/en/docs/http/ngx_http_status_module.
akka - Akka应用中的JMX监控/统计
除了 Cluster MBean 之外，是否有任何可以在 Akka (Java) 中启用的内置 JMX 公开监控/统计信息？我看过 Typesafe Console，但由于它需要许可证才能用于从多个节
c++ - “rusage” 统计
我正在尝试在我的程序中使用“usage”统计信息来获取类似于 time 的数据工具。但是，我很确定我做错了什么。这些值似乎是正确的，但有时可能有点奇怪。我没有在网上找到好的资源。有人知道如何做得更好吗
PHP、MySQL 统计
我有一个带有统计表的 MySQL 数据库。我想以年历、月度的形式输出数据。对于没有点击率的几个月，我想花费一个“空”DIV。有两个ID。 $query = mysqli_query($db,"SELE
python - 统计:优化python中的概率计算
设置: 问题是经典概率问题的复杂形式: 70 colored balls are placed in an urn, 10 for each of the seven rainbow colors.
Ruby 统计 gem
有哪些 Ruby gem 可以执行数据处理？最佳答案我知道有 3 种从 Ruby 访问 R 的方法: RinRuby RSRuby 通过 Rserve-Ruby-Client 预约 RinRuby
python实战之实现excel读取、统计、写入的示例讲解
背景图像领域内的一个国内会议快要召开了，要发各种邀请邮件，之后要录入、统计邮件回复（参会还是不参会等）。如此重要的任务，老师就托付给我了。ps: 统计回复邮件的时候，能知道谁参会或谁不参会。
java - 统计 ArrayList 中一定范围内的数字
我正在添加用户输入的几个数字并将它们添加到数组列表中。到目前为止我的代码: package project143; import java.util.*; /** * @author -- */
google-analytics - 离线应用程序的分析+统计？
正如标题所示，我需要做的是在各种 iO/Android/Windows 应用程序中跟踪各种用户事件 - 例如点击、滑动、在页面上花费的时间等。这些应用程序基于响应式 HTML/CSS/JS，并具有简
javascript - 统计 HTML 表中唯一值的实例
我希望计算 HTML 表中每个唯一值的实例数，并在其自己的表中返回结果。该表是根据用户的文本输入生成的。例如，用户输入可能如下所示: Report 46 Bob Marley 4/20/2
php 统计 z 分数正态分布
如何使用 PHP 计算数字数组的 z 分数？我需要计算 z 分数，然后找到百分位数 (CDF)!我可以使用哪些 PHP 函数？谢谢! 最佳答案以下代码将给出 CDF 的良好近似值(Abramowit
github - 统计 GitHub 上的空存储库数量
我只是想知道是否可以计算 GitHub 上空存储库的总数。如果不适合所有用户，可以为自己做吗？编辑我已经尝试过size:0搜索，但似乎返回了很多包含数据的存储库。采用 size:0..1 之类的
java - 统计 url 中的外部链接
public class Scanner { private HtmlProcessor hp; private String baseUrl; private int ste
java - 统计 Mule 流处理的请求数
我正在使用 Mule ESB 3.4。我想开发一个自定义 Java 组件来计算流收到的请求数量。流程将例如像这样: http inbound-endpoint -> counter -> vm-out
github - 统计 GitHub 上开放的拉取请求和问题
我喜欢借助 GitHub API 来统计存储库中所有开放的拉取请求和问题。我发现 API 端点 /repos/:owner/:repo 结果包含 open_issues 属性。然而，这是问题和拉取请求
php 统计 z 分数正态分布
如何使用 PHP 计算数字数组的 z 分数？我需要计算 z 分数，然后找到百分位数 (CDF)!我可以使用哪些 PHP 函数？谢谢! 最佳答案以下代码将给出 CDF 的良好近似值(Abramowit
c# - 统计 MongoDB 中的项目数
已关闭。此问题需要 debugging details 。目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and the
python - 统计 Google 新闻搜索结果
我正在尝试以编程方式获取搜索字词列表的 Google 新闻搜索结果计数(即有多少个结果)，但仅限于过去 1 年。使用用户界面搜索时，结果计数仅出现在常规搜索中，但在“工具 > 最近 > 过去一年”下时

IT老高

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城