c++ - boolean 值在编译器中为 8 位。对它们的操作效率低下吗？-6ren

c++ - boolean 值在编译器中为 8 位。对它们的操作效率低下吗？

转载作者：行者123 更新时间：2023-12-01 18:39:05

我正在阅读 Agner Fog 的“Optimizing software in C++”(专用于 Intel、AMD 和 VIA 的 x86 处理器)，它在第 34 页上说明

Boolean variables are stored as 8-bit integers with the value 0 for false and 1 for true. Boolean variables are overdetermined in the sense that all operators that have Boolean variables as input check if the inputs have any other value than 0 or 1, but operators that have Booleans as output can produce no other value than 0 or 1. This makes operations with Boolean variables as input less efficient than necessary.

今天仍然如此吗？在什么编译器上？你能举个例子吗？作者指出

The Boolean operations can be made much more efficient if it is known with certainty that the operands have no other values than 0 and 1. The reason why the compiler doesn't make such an assumption is that the variables might have other values if they are uninitialized or come from unknown sources.

这是否意味着如果我以函数指针 bool(*)() 为例并调用它，那么对它的操作会产生低效的代码？或者当我通过取消引用指针或从引用中读取来访问 boolean 值然后对其进行操作时是这样吗？

最佳答案

TL:DR :当前的编译器在做类似的事情时仍然有 bool 遗漏优化(a&&b) ? x : y。但原因不是他们不假设 0/1，他们只是在做这个。bool 的许多用途是用于局部变量或内联函数，因此 boolean 化为 0/1 可以在原始条件下优化离开和分支(或 cmov 或其他)。只需要担心优化 bool 输入/输出，因为它确实必须通过非内联或真正存储在内存中的东西传递/返回。
可能的优化准则 :将来自外部源(函数参数/内存)的 bool s 与按位运算符，如 a&b 结合MSVC 和 ICC 在这方面做得更好。 IDK 如果本地 bool s 更糟。请注意， a&b 仅相当于 a&&b 的 bool ，而不是整数类型。 2 && 1 为真，但 2 & 1 为 0，为假。按位 OR 没有这个问题。
IDK，如果此准则对通过函数内的比较(或内联的东西)设置的局部变量会造成伤害。例如。它可能会导致编译器实际生成整数 boolean 值，而不是在可能的情况下直接使用比较结果。另请注意，它似乎对当前的 gcc 和 clang 没有帮助。

是的，x86 上的 C++ 实现将 bool 存储在一个始终为 0 或 1 的字节中(至少在编译器必须遵守 ABI/调用约定的函数调用边界内)。
编译器有时会利用这一点，例如对于 bool -> int 转换，即使是 gcc 4.4 也只是零扩展到 32 位( movzx eax, dil )。 Clang 和 MSVC 也这样做。 C 和 C++ 规则要求这种转换产生 0 或 1，因此只有假设 bool 函数 arg 或全局变量具有 0 或 1 值始终是安全的，这种行为才是安全的。
即使是旧的编译器通常也会在 bool -> int 中利用它，但在其他情况下不会。因此，阿格纳说的原因是错误的:

The reason why the compiler doesn't make such an assumption is that the variables might have other values if they are uninitialized or come from unknown sources.

MSVC CL19 确实生成了假定 bool 函数参数为 0 或 1 的代码，因此 Windows x86-64 ABI 必须保证这一点。
在 x86-64 System V ABI(Windows 以外的所有设备都使用)中，修订版 0.98 的变更日志说“指定 _Bool(又名 bool)在调用者处被 boolean 化。”我认为甚至在这种变化之前，编译器就已经假设了它，但这只是记录了编译器已经依赖的东西。 x86-64 SysV ABI 中的当前语言是:

3.1.2 Data Representation

Booleans, when stored in a memory object, are stored as single byte objects the value of which is always 0 (false) or 1 (true). When stored in integer registers (except for passing as arguments), all 8 bytes of the register are significant; any nonzero value is considered true.

第二句话是胡说八道:ABI 没有告诉编译器如何在函数内的寄存器中存储东西，只在不同编译单元(内存/函数参数和返回值)之间的边界处。我不久前报告了这个 ABI 缺陷 on the github page where it's maintained 。

3.2.3 Parameter passing:

When a value of type _Bool is returned or passed in a register or on the stack, bit 0 contains the truth value and bits 1 to 7 shall be zero¹⁶.

(footnote 16): Other bits are left unspecified, hence the consumer side of those values can rely on it being 0 or 1 when truncated to 8 bit.

i386 System V ABI 中的语言是相同的，IIRC。

任何假设 0/1 为一件事(例如转换为 int )但在其他情况下未能利用它的编译器都会有 错过优化 。不幸的是，这种遗漏的优化仍然存在，尽管它们比 Agner 写那段关于编译器总是重新 boolean 化的那段更罕见。
( Godbolt compiler explorer for gcc4.6/4.7 和 clang/MSVC 上的源 + asm。另请参阅 Matt Godbolt 的 CppCon20213131017 谈话)

bool logical_or(bool a, bool b) { return a||b; }

 # gcc4.6.4 -O3 for the x86-64 System V ABI
    test    dil, dil            # test a against itself (for non-zero)
    mov     eax, 1
    cmove   eax, esi            # return   a ? 1 : b;
    ret

因此，即使 gcc4.6 也没有重新 boolean 化 b ，但它确实错过了 gcc4.7 所做的优化:(以及其他答案中显示的 clang 和更高版本的编译器):

    # gcc4.7 -O3 to present: looks ideal to me.
    mov     eax, esi
    or      eax, edi
    ret

(Clang 的 or dil, sil/ mov eax, edi 很愚蠢:在写入 0x2518143124313 后读取 edi 时，它肯定会导致 Nehalem 或更早的 Intel 上的部分寄存器停顿，并且需要使用前缀为低的 R-i 的代码。更好的选择可能是 dil/ or dil,sil，如果您想避免读取任何 32 位寄存器，以防您的调用者留下一些带有“脏”部分寄存器的 arg 传递寄存器。)
MSVC 发出此代码，分别检查 movzx eax, dil 然后 a ，完全没有利用任何东西 ，甚至使用 181341142 ，甚至使用 181341142 的 1825 18 18 18 18 18 10 18 23134313431343因此，它对大多数 CPU ( What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid ) 上的旧值 b 具有错误的依赖性。这只是愚蠢的。使用 xor al,al 的唯一原因是您明确希望保留高位字节。

logical_or PROC                     ; x86-64 MSVC CL19
    test     cl, cl                 ; Windows ABI passes args in ecx, edx
    jne      SHORT $LN3@logical_or
    test     dl, dl
    jne      SHORT $LN3@logical_or
    xor      al, al                 ; missed peephole: xor eax,eax is strictly better
    ret      0
$LN3@logical_or:
    mov      al, 1
    ret      0
logical_or ENDP

ICC18 也没有利用输入的已知 0/1 特性，它只是使用 xor eax,eax 指令根据两个输入的按位或来设置标志，并使用 eax 来产生 0/1。

logical_or(bool, bool):             # ICC18
    xor       eax, eax                                      #4.42
    movzx     edi, dil                                      #4.33
    movzx     esi, sil                                      #4.33
    or        edi, esi                                      #4.42
    setne     al                                            #4.42
    ret                                                     #4.42

即使对于 xor al,al ICC 也会发出相同的代码。它提升到 or(带有 setcc )，并使用 bool bitwise_or(bool a, bool b) { return a|b; } 根据按位或设置标志。与 int/ movzx 相比，这是愚蠢的。
对于 or ，MSVC 只使用 or dil,sil 指令(在每个输入的 setne al 之后)，但无论如何都不会重新 boolean 化。

当前 gcc/clang 中遗漏的优化:
只有 ICC/MSVC 用上面的简单函数制作了愚蠢的代码，但是这个函数仍然给 gcc 和 clang 带来麻烦:

int select(bool a, bool b, int x, int y) {
    return (a&&b) ? x : y;
}

including Haswell/Skylake, which don't rename low-8 partial regs separately from the whole register, only AH/BH/... (相同的来源，与上次选择的编译器不同)。
看起来很简单；您希望智能编译器能够使用 bitwise_or/ or 无分支地完成它。 x86 的 movzx 指令根据位与设置标志。这是一个实际上不写入目标的 AND 指令。 (就像 test 是不写入目的地的 cmov 一样)。

# hand-written implementation that no compilers come close to making
select:
    mov     eax, edx      # retval = x
    test    edi, esi      # ZF =  ((a & b) == 0)
    cmovz   eax, ecx      # conditional move: return y if ZF is set
    ret

但即使是 Godbolt 编译器资源管理器上 gcc 和 clang 的日常构建也会使代码复杂得多，分别检查每个 boolean 值。如果您返回 test ，他们知道如何优化 cmp ，但即使这样编写(使用单独的 boolean 变量来保存结果)也无法手动控制它们以制作不烂的代码。
请注意， Source+asm on the Godbolt compiler explorer 更小，因此它是编译器使用的。
Clang 的 版本严格来说比我的手写版本差。 (请注意，它要求调用者将 sub args 零扩展为 32 位， bool ab = a&&b; is exactly equivalent to ab )。

select:  # clang 6.0 trunk 317877 nightly build on Godbolt
    test    esi, esi
    cmove   edx, ecx         # x = b ? y : x
    test    edi, edi
    cmove   edx, ecx         # x = a ? y : x
    mov     eax, edx         # return x
    ret

gcc 8.0.0 20171110 nightly 为此生成分支代码，类似于旧的 gcc 版本所做的。

select(bool, bool, int, int):   # gcc 8.0.0-pre   20171110
    test    dil, dil
    mov     eax, edx          ; compiling with -mtune=intel or -mtune=haswell would keep test/jcc together for macro-fusion.
    je      .L8
    test    sil, sil
    je      .L8
    rep ret
.L8:
    mov     eax, ecx
    ret

MSVC x86-64 CL19 生成非常相似的分支代码。它针对 Windows 调用约定，其中整数 args 在 rcx、rdx、r8、r9 中。

select PROC
        test     cl, cl         ; a
        je       SHORT $LN3@select
        mov      eax, r8d       ; retval = x
        test     dl, dl         ; b
        jne      SHORT $LN4@select
$LN3@select:
        mov      eax, r9d       ; retval = y
$LN4@select:
        ret      0              ; 0 means rsp += 0 after popping the return address, not C return 0.
                                ; MSVC doesn't emit the `ret imm16` opcode here, so IDK why they put an explicit 0 as an operand.
select ENDP

ICC18 也产生分支代码，但在分支后都有 test same,same 指令。

select(bool, bool, int, int):
        test      dil, dil                                      #8.13
        je        ..B4.4        # Prob 50%                      #8.13
        test      sil, sil                                      #8.16
        jne       ..B4.5        # Prob 50%                      #8.16
..B4.4:                         # Preds ..B4.2 ..B4.1
        mov       edx, ecx                                      #8.13
..B4.5:                         # Preds ..B4.2 ..B4.4
        mov       eax, edx                                      #8.13
        ret                                                     #8.13

尝试使用 帮助编译器

int select2(bool a, bool b, int x, int y) {
    bool ab = a&&b;
    return (ab) ? x : y;
}

导致 MSVC 制作非常糟糕的代码 :

;; MSVC CL19  -Ox  = full optimization
select2 PROC
    test     cl, cl
    je       SHORT $LN3@select2
    test     dl, dl
    je       SHORT $LN3@select2
    mov      al, 1              ; ab = 1

    test     al, al             ;; and then test/cmov on an immediate constant!!!
    cmovne   r9d, r8d
    mov      eax, r9d
    ret      0
$LN3@select2:
    xor      al, al            ;; ab = 0

    test     al, al            ;; and then test/cmov on another path with known-constant condition.
    cmovne   r9d, r8d
    mov      eax, r9d
    ret      0
select2 ENDP

这仅适用于 MSVC(并且 ICC18 在刚刚设置为常量的寄存器上有同样错过的 test/cmov 优化)。
gcc 和 clang 像往常一样不会使代码像 MSVC 一样糟糕；他们为 cmp reg, 0 制作了相同的 asm ，这仍然不好，但至少尝试帮助他们不会像使用 MSVC 那样使情况变得更糟。

将 bool 与按位运算符结合使用有助于 MSVC 和 ICC
在我非常有限的测试中，对于 MSVC 和 ICC， mov 和 select() 似乎比 bool 和 | 工作得更好。使用编译器 + 编译选项查看您自己代码的编译器输出，看看会发生什么。

int select_bitand(bool a, bool b, int x, int y) {
    return (a&b) ? x : y;
}

Gcc 仍然在两个输入的单独 & s 上单独分支 ，代码与 || 的其他版本相同。 clang 仍然有两个单独的 && ，与其他源版本相同。
MSVC 通过并正确优化，击败了所有其他编译器(至少在独立定义中):

select_bitand PROC            ;; MSVC
    test     cl, dl           ;; ZF =  !(a & b)
    cmovne   r9d, r8d
    mov      eax, r9d         ;; could have done the mov to eax in parallel with the test, off the critical path, but close enough.
    ret      0

ICC18 浪费了两条 test 指令将 select s 零扩展到 test/cmov ，但随后制作了与 MSVC 相同的代码

select_bitand:          ## ICC18
    movzx     edi, dil                                      #16.49
    movzx     esi, sil                                      #16.49
    test      edi, esi                                      #17.15
    cmovne    ecx, edx                                      #17.15
    mov       eax, ecx                                      #17.15
    ret                                                     #17.15

关于c++ - boolean 值在编译器中为 8 位。对它们的操作效率低下吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47243955/