assembly - 循环 "xorl %edx,%eax; shrl $1,%edx"的目的是什么？-6ren

assembly - 循环 "xorl %edx,%eax; shrl $1,%edx"的目的是什么？

转载作者：行者123 更新时间：2023-12-05 00:18:01

27

4

我有以下 x86 汇编代码:

  movl   8(%ebp), %edx  //get an argument from the caller
  movl   $0, %eax
  testl  %edx, %edx
  je     .L1            
.L2:                   // what's the purpose of this loop body?
  xorl   %edx, %eax
  shrl   $1, %edx
  jne    .L2
.L1:
  andl   $1, %eax

教科书给出的对应C代码如下

int f1(unsigned x)
{
    int y = 0;
    while(x != 0) {
        __________;
    }
    return __________;
 }

本书要求读者填空并回答“它有什么作用？”的问题。

我不能在一个 C 表达式中组合循环体。我可以说出循环体的作用，但我不知道它的用途。教科书还说，这里的 %eax 存储了返回值。所以……这样做的目的是什么

andl  $1, %eax

我也不知道。

最佳答案

看起来整个循环的目的是对 32 位 arg 中的所有位进行异或运算。即计算 parity .

从最后一条指令 ( and $1,%eax ) 向后工作，我们知道只有结果的低位才重要。

考虑到这一点，xor %edx,%eax变得更清晰:异或%edx的当前低位进入 %eax .高垃圾无所谓。
shr循环直到所有 x的位已移出。我们总是可以循环 32 次以获取所有位，但这比停止一次效率低 x是 0。(由于 XOR 的工作原理，我们不需要在 0 位中进行实际的 XOR；这没有效果。)

一旦我们知道函数的作用，填充 C 就变成了巧妙/紧凑的 C 语法练习。我一开始以为y ^= (x>>=1);将适合循环内，但转移 x在第一次使用之前。

我在一个 C 语句中看到的唯一方法是使用 ,运算符(它确实引入了 sequence point ，因此可以安全地读取左侧的 x 并在 , 的右侧修改它)。所以，y ^= x, x>>=1;适合。

或者，为了获得更易读的代码，只需作弊并将两个语句与 ; 放在同一行。 .

int f1(unsigned x) {
    int y = 0;
    while(x != 0) {
        y ^= x;  x>>=1;      
    }
    return y & 1;
 }

这将编译为与问题 中显示的基本相同的 asm , 使用 gcc5.3 -O3 on the Godbolt compiler explorer .问题代码 de-optimizes the xor-zeroing idiom到 mov $0, %eax ，并优化了 gcc 对 ret 的愚蠢重复指示。 (或者可能使用了没有这样做的早期版本的 gcc。)

循环非常低效:这是一种有效的方式:

我们不需要复杂度为 O(n) 的循环(其中 n 是以位为单位的宽度 x )。相反，我们可以获得 O(log2(n)) 的复杂度，并且实际上利用 x86 技巧只执行前两个步骤。

对于由寄存器确定的指令，我省略了操作数大小的后缀。 (除了 xorw 使 16 位异或显式。)

#untested
parity:
    # no frame-pointer boilerplate

    xor       %eax,%eax        # zero eax (so the upper 24 bits of the int return value are zeroed).  And yes, this is more efficient than mov $0, %eax
                               # so when we set %al later, the whole of %eax will be good.

    movzwl    4(%esp), %edx      # load low 16 bits of `x`.  (zero-extend into the full %edx is for efficiency.  movw 4(%esp), %dx would work too.
    xorw      6(%esp), %dx       # xor the high 16 bits of `x`
    # Two loads instead of a load + copy + shift is probably a win, because cache is fast.
    xor       %dh, %dl           # xor the two 8 bit halves, setting PF according to the result
    setnp      %al               # get the inverse of the CPU's parity flag.  Remember that the rest of %eax is already zero, so the result is already zero-extended to 32-bits (int return value)
    ret

是的，没错， x86 has a parity flag ( PF )这是从“根据结果设置标志”的每条指令的结果的低 8 位更新的，例如 xor .

我们使用 np条件因为 PF = 1 表示偶校验:所有位的异或 = 0。我们需要反向返回 0 以进行偶校验。

为了利用它，我们通过将高半部分降低到低半部分并合并，重复两次以将 32 位减少到 8 位来进行 SIMD 风格的水平缩减。

在设置标志的指令之前将 eax 归零(使用异或)比设置标志/ setp %al 稍微更有效/ movzbl %al, %eax ，正如我在 What is the best way to set a register to zero in x86 assembly: xor, mov or and? 中解释的那样.

或者，正如@EOF 指出的那样，如果 CPUID POPCNT feature bit is set ，可以使用popcnt测试低位，看看设置的位数是偶数还是奇数。 (另一种看待这个问题的方式:xor 是加无进位，因此无论是将所有位异或还是将所有位水平相加，低位都是相同的)。

GNU C 也有 __builtin_parity和 __builtin_popcnt如果您告诉编译器编译目标支持它(使用 -march=... 或 -mpopcnt )，则使用硬件指令，否则编译为目标机器的有效序列。 Intel 内在函数总是编译为机器指令，而不是回退序列，并且在没有适当的情况下使用它们是编译时错误 -mpopcnt目标选项。

不幸的是，gcc 没有将纯 C 循环识别为奇偶校验计算并将其优化为此。一些编译器(比如 clang 和 gcc)可以识别某些类型的 popcount 习惯用法，并将它们优化为 popcnt指令，但在这种情况下不会发生这种模式识别。 :(

See these on godbolt .

int parity_gnuc(unsigned x) {
    return  __builtin_parity(x);
}
    # with -mpopcnt, compiles the same as below
    # without popcnt, compiles to the same upper/lower half XOR algorithm I used, and a setnp
    # using one load and mov/shift for the 32->16 step, and still %dh, %dl for the 16->8 step.

#ifdef __POPCNT__
#include <immintrin.h>
int parity_popcnt(unsigned x) {
    return  _mm_popcnt_u32(x) & 1;
}
#endif

    # gcc does compile this to the optimal code:
    popcnt    4(%esp), %eax
    and       $1, %eax
    ret

另请参阅 x86 中的其他链接标记维基。

关于assembly - 循环 "xorl %edx,%eax; shrl $1,%edx"的目的是什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38886479/

27

4

0

文章推荐： r - 从向量中获取唯一组合的网格

文章推荐： google-bigquery - 基于列数据的分区？

文章推荐： r - ggplot 中的多个 geom_hline

assembly - "or eax,eax"和 "test eax,eax"的区别
这个问题在这里已经有了答案: Test whether a register is zero with CMP reg,0 vs OR reg,reg? (2 个回答) 3年前关闭。 or eax,e
assembly - 在x86中 "test eax,eax"和 "cmp eax,0"有什么区别
test eax, eax 是否比 cmp eax, 0 更高效？是否存在需要 test eax, eax 而 cmp eax, 0 不满足要求的情况？最佳答案正如臧明杰在评论中已经说过的，tes
assembly - 测试点 %eax %eax
这个问题已经有答案了: 已关闭10 年前。 Possible Duplicate: x86 Assembly - ‘testl’ eax against eax? 我对汇编语言编程非常陌生，我目前正在
c++ - XOR AL,AL + MOVZX EAX, AL 比 XOR EAX,EAX 有什么优势？
我有一些未知的 C++ 代码是在发布版本中编译的，因此对其进行了优化。我正在努力解决的问题是: xor al, al add esp, 8 cmp byte ptr [ebp+
assembly - x86-32 汇编中的 "lea eax, [ebx + eax]"和 "add eax, ebx"有什么区别？
GCC 给我做了一些汇编代码，里面有这个语句: lea eax, [ebx+eax] (英特尔语法) 只是好奇，这有什么区别，和: add eax, ebx 是？ eax 和 ebx 包含函数的返回值
assembly - leal (%eax, %eax) 有什么作用？
leal (%eax, %eax) 是什么意思做？ %eax * 2 处的内容是否会相乘，因为它有括号？最佳答案它将使用 eax 的总和加载目标操作数和 eax , IOW, 2* eax . 关
assembly - `teSTL` eax 与 eax？
我正在尝试理解一些汇编。程序集如下，我对teSTL行感兴趣: 000319df 8b4508 movl 0x08(%ebp), %eax 000319e2 8b4004
c - LEA EAX，[EAX] 有什么意义？
LEA EAX, [EAX] 我在使用 Microsoft C 编译器编译的二进制文件中遇到了这条指令。它显然不能改变 EAX 的值。那它为什么在那里？最佳答案这是一个NOP。以下通常用作NOP
gcc - GCC 何时插入 xor eax,eax？
我正在二进制文件中搜索特定指令，对于 xor eax,eax 指令，我有一个函数，如: int foo(){ return 0; } 如果我用 GCC 4.7.2 和 -O2 或 -O3 优化标志
gcc - GCC 何时插入 xor eax,eax？
我正在二进制文件中搜索特定指令，对于 xor eax,eax 指令，我有一个函数，如: int foo(){ return 0; } 如果我用 GCC 4.7.2 和 -O2 或 -O3 优化标志
iphone - mov 0x40(%eax),%eax 是什么意思？
什么是 0x01b55ee2 mov 0x40(%eax),%eax 是什么意思？我应该如何解释 0x40()，因为我的代码在该位置崩溃时遇到问题。 %eax寄存器的内容为0。最佳答案
assembly - mov eax, dword ptr [eax] 有什么作用？
我知道 dword ptr 是一个 size 指令，它指示移动内容的大小，我知道 mov eax, eax 是一种 nop 代码形式，但这有什么作用？我认为它将 eax 的地址与内部的十六进制值交换
assembly - "mov (%ebx,%eax,4),%eax"是如何工作的？
这个问题已经有答案了: What is the meaning of MOV (%r11,%r12,1), %edx? (2 个回答) 已关闭 5 年前。一直在从事 assembly 作业，并且在很
gcc - 错误 : junk `bswapl eax movl %eax' after register
我在 GAS 源代码中定义了一个 MACRO。但是不是gcc编译的。下面是我定义的MACRO。 #define MSGSCHEDULE0(index) \ movl (index*4)(%r
gcc - gcc 生成的 x86_64 汇编代码中的 xorl %eax, %eax
我在组装方面完全是个菜鸟，只是四处看看发生了什么。无论如何，我写了一个非常简单的函数: void multA(double *x,long size) { long i; for(i=0; i
gcc - 为什么 GCC 生成 mov %eax,%eax 是什么意思？
GCC 4.4.3 生成了以下 x86_64 程序集。让我困惑的部分是 mov %eax,%eax .将寄存器移至自身？为什么？ 23b6c: 31 c9
assembly - PUSH eax 和 mov [esp]、eax 之间的区别？
两条线有什么区别 push eax mov [esp], eax 不将 eax 压入堆栈(esp 所指向的位置与 mov [esp]、eax 一样？) 最佳答案 “push”会自动改变“esp”(你的
assembly - 当 eax 包含 0 时取消引用 eax 的目的是什么？
以下说明有何意义？ xor eax,eax mov eax,[eax] 当您对 eax 进行 XOR 时，大多数情况下您会得到零，那么您可以取消引用地址 [eax] 中的内容(在本例中 eax 包含
assembly - "mov eax, [num]"和 "mov eax, num"之间的区别
我是一个初学者，正在编写汇编程序以使用以下代码打印从 1 到 9 的数字: section .text global _start _start:
gcc - 为什么 `mov %eax, %eax; nop` 比 `nop` 快？
Apparently ，现代处理器可以判断您是否做了一些愚蠢的事情，例如将寄存器移动到自身 ( mov %eax, %eax ) 并将其优化。为了验证该声明，我运行了以下程序: #include #

首页

博学

6Ren·AI

商城

assembly - 循环 "xorl %edx,%eax; shrl $1,%edx"的目的是什么？