assembly - 有没有办法增加 xmm 寄存器中的值？-6ren

assembly - 有没有办法增加 xmm 寄存器中的值？

转载作者：行者123 更新时间：2023-12-02 22:03:46

我想知道，有没有办法增加 xmm 寄存器中的值，或者只能将一个值移到一个寄存器中？

我的意思是，你可以这样做:

inc eax

或者像这样:

inc [ebp+7F00F000]

有没有办法对 xmm 做同样的事情？

我尝试过类似的东西，但是......它不起作用

  inc [rbx+08]
  movss xmm1,[rbx+08]

我什至尝试过一些非常愚蠢的方法，但它也不起作用

push edx
pextrw edx,xmm2,0
add edx,1
mov [rbx+08],edx
movss xmm1,[rbx+08]
pop edx

最佳答案

xmm regs 没有等价的 inc，并且没有 paddw 的立即操作数形式(因此没有与 add eax, 1 等价的东西> 任一)。

paddw (and other element sizes)仅适用于 xmm/m128 源操作数。因此，如果你想增加向量的一个元素，你需要从内存中加载一个常量， or generate it on the fly .

例如增加 xmm0 的所有元素的最便宜的方法是:

; outside the loop
pcmpeqw    xmm1,xmm1     # xmm1 = all-ones = -1

; inside the loop
psubw      xmm0, xmm1    ; xmm0 -= -1   (in each element).  i.e. xmm0++

或者

paddw      xmm0, [ones]  ; where ones is a static constant.

如果需要两个以上的指令来构造常量，或者如果寄存器压力是一个问题，那么从内存加载常量可能是一个好主意。

<小时/>

例如，如果您想构造一个常量以仅递增低 32 位元素，则可以使用字节移位将其他元素归零:

; hoisted out of the loop
pcmpeqw    xmm1,xmm1     # xmm1 = all-ones = -1
psrldq     xmm1, 12      # xmm1 = [ 0 0 0 -1 ]


; in the loop
psubd      xmm0, xmm1

<小时/>

如果您的尝试只是增加 xmm2 中的低 16 位元素，那么是的，这是一个愚蠢的尝试。 IDK 您正在做什么存储到 [rbx+8] 中，然后加载到 xmm1(将高 96 位清零)。

以下是如何以一种不太愚蠢的方式编写 xmm -> gp -> xmm 往返过程。 (与带有向量常量的 paddw 相比仍然很糟糕)。

# don't push/pop.  Instead, pick a register you can clobber without saving/restoring
movd    edx, xmm2       # this is the cheapest way to get the low 16.  It doesn't matter that we also get the element 1 as garbage in the high half of edx
inc     edx             # we only care about dx, but this is still the most efficient instruction
pinsrw  xmm2, edx, 0    # normally you'd just use movd again, but we actually want to merge with the old contents.

如果您想使用 16 位以外的元素，您可以使用 SSE4.1 pinsrb/d/q，或者您可以使用 movd 和随机播放。

<小时/>

参见Agner Fog's Optimize Assembly有关如何使用 SSE 向量的更多好提示的指南。还有 x86 中的其他链接标签维基。

关于assembly - 有没有办法增加 xmm 寄存器中的值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38294464/