gpt4 book ai didi

c++ - 如何隐藏SHLD延迟?

转载 作者:太空宇宙 更新时间:2023-11-04 12:06:08 24 4
gpt4 key购买 nike

我有一个简单的位读取器,它使用 SHLD 指令 ( __shiftleft128 ) 读取位流。

这很好用。但是,我一直在做一些分析,我注意到 SHLD 指令之后的任何指令都需要很多时间。

    Assembly                    CPU Time    Instructions Retired
add r10b, r9b 19.000ms 92,000,000
cmp r10b, 0x40 58.000ms 180,000,000
jb 0x140016fa6 <Block 24>
Block 23:
and r10b, 0x3f 43.000ms 204,000,000
mov r15, r11 30.000ms 52,000,000
mov qword ptr [rbp+0x20], r11
add rbx, 0x8 16.000ms 78,000,000
mov qword ptr [rbp+0x10], rbx
mov r11, qword ptr [rbx] 6.000ms 44,000,000
bswap r11 2.000ms
mov qword ptr [rbp+0x28], r11 8.000ms 20,000,000
Block 24:
mov rdx, r15 61.000ms 208,000,000
movzx ecx, r10b 1.000ms 6,000,000
**shld** rdx, r11, cl 24.000ms 58,000,000
inc edi **127.000ms** 470,000,000

如上表所示,shld 指令之后的 inc 指令占用了大量时间(8% 的 CPU 时间)。

我想进一步了解为什么会出现这种情况以及如何避免这种情况?是否有任何指令可以在 cpu 级别与 shld 并行运行?

我记得在一些 AMD 优化手册中读过关于 shld 的内容,但我找不到了。

最佳答案

很难说,但似乎延迟是某些异常处理例程的结果。

行为

但是 Intel 手册为 shld 指定了一些调用未定义响应的情况:-

The destination operand can be a register or a memory location; the source operand is a register. The count operand is an unsigned integer that can be stored in an immediate byte or in the CL register. If the count operand is CL, the shift count is the logical AND of CL and a count mask. In non-64-bit modes and default 64-bit mode; only bits 0 through 4 of the count are used. This masks the count to a value between 0 and 31. If a count is greater than the operand size, the result is undefined.

If the count is 1 or greater, the CF flag is filled with the last bit shifted out of the destination operand and the SF, ZF, and PF flags are set according to the value of the result. For a 1-bit shift, the OF flag is set if a sign change occurred; otherwise, it is cleared. For shifts greater than 1 bit, the OF flag is undefined. If a shift occurs, the AF flag is undefined. If the count operand is 0, the flags are not affected. If the count is greater than the operand size, the flags are undefined.

shld 异常:-

In Protected Mode --> #GP(0),#SS(0),#PF(fault-code),#AC(0),#UD

更新::常见问题:-->
首先是定义:-

Instructions Retired — Event select C0H, Umask 00H
This event counts the number of instructions at retirement. For instructions that consist of multiple micro-ops, this event counts the retirement of the last microop of the instruction. An instruction with a REP prefix counts as one instruction (not per iteration). Faults before the retirement of the last micro-op of a multiops instruction are not counted.
This event does not increment under VM-exit conditions. Counters continue counting during hardware interrupts, traps, and inside interrupt handlers.

inc edi **127.000ms** 470,000,000(指令失效)
从上面的定义可以很清楚地看出,要么这条指令中断了太多的微操作,要么同时运行了一些中断处理程序。

关于c++ - 如何隐藏SHLD延迟?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12011394/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com