gpt4 book ai didi

c++ - 汇编/__asm 内联

转载 作者:行者123 更新时间:2023-11-30 00:38:07 24 4
gpt4 key购买 nike

我正在学习汇编并在我的 Digital Mars C++ 编译器中进行一些内联​​。我搜索了一些东西来使程序更好,并使用这些参数来调整程序:

use better C++ compiler//thinking of GCC or intel compiler

use assembly only in critical part of program

find better algorithm

Cache miss, cache contention.

Loop-carried dependency chain.

Instruction fetching time.

Instruction decoding time.

Instruction retirement.

Register read stalls.

Execution port throughput.

Execution unit throughput.

Suboptimal reordering and scheduling of micro-ops.

Branch misprediction.

Floating point exception.

除了“register read stalls”,我都听懂了。

问题:谁能告诉我这在 CPU 中是如何发生的以及“乱序执行”的“超标量”形式?正常的“乱序”似乎合乎逻辑,但我找不到“超标量”形式的合乎逻辑的解释。

问题 2:有人还可以给出一些 SSE SSE2 和较新 CPU 的良好指令列表,最好是微操作表、端口吞吐量、单元和一些延迟计算表,以找到一段代码的真正瓶颈?

我会很高兴有这样一个小例子:

//loop carried dependency chain breaking:
__asm
{
loop_begin:
....
....
sub edx,05h //rather than taking i*5 in each iteration, we sub 5 each iteration
sub ecx,01h //i-- counter
...
...
jnz loop_begin//edit: sub ecx must have been after the sub edx for jnz
}
//while sub edx makes us get rid of a multiplication also makes that independent of ecx, making independent

谢谢。

计算机:Pentium-M 2GHz,Windows XP-32 位

最佳答案

关于c++ - 汇编/__asm 内联,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11701888/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com