gpt4 book ai didi

x86 - CPU 架构演进如何影响虚函数调用性能?

转载 作者:行者123 更新时间:2023-12-03 09:11:38 26 4
gpt4 key购买 nike

几年前,我正在学习 x86 汇编器、CPU 流水线、缓存未命中、分支预测以及所有这些爵士乐。

这是一个两半的故事。我阅读了处理器中冗长管道的所有美妙优势,即指令重新排序、缓存预加载、依赖交错等。

不利的一面是,任何偏离规范的行为都代价高昂。例如,IIRC 早期千兆赫时代的某个 AMD 处理器有一个 40个周期每次通过指针(!)调用函数时都会受到惩罚,这显然是正常的。

这不是一个可以忽略的“不用担心”的数字! 请记住,“好的设计”通常意味着“尽可能多地考虑你的功能”和“在数据类型中编码语义”,这通常意味着虚拟接口(interface)。

权衡是不执行此类操作的代码每个周期可能会获得两条以上的指令。这些是编写高性能 C++ 代码时需要担心的数字,这些代码重于对象设计而轻于数字运算。

据我了解,随着我们进入低功耗时代,长 CPU 流水线的趋势正在逆转。这是我的问题:

最新一代的 x86 兼容处理器是否仍然会因虚函数调用、错误的分支预测等而遭受巨大的惩罚?

最佳答案

AMD processor in the early-gigahertz era had a 40 cycle penalty every time you called a function


呵呵。。那么大。。
有一种“间接分支预测”方法,它有助于预测虚函数跳转,如果前段时间有相同的间接跳转。第一个和错误预测的 virt 仍然会受到惩罚。功能跳转。
支持从简单的“当且仅当前一个间接分支完全相同时预测正确”到非常复杂的两级数十或数百条目,其中检测单个间接 jmp 指令的 2-3 个目标地址的周期性交替。
这里有很多进化...
http://arstechnica.com/hardware/news/2006/04/core.ars/7

first introduced with the Pentium M: ... indirect branch predictor.

The indirect branch predictor

Because indirect branches load their branch targets from a register, instead of having them immediately available as is the case with direct branches, they're notoriously difficult to predict. Core's indirect branch predictor is a table that stores history information about the preferred target addresses of each indirect branch that the front end encounters. Thus when the front-end encounters an indirect branch and predicts it as taken, it can ask the indirect branch predictor to direct it to the address in the BTB that the branch will probably want.


http://www.realworldtech.com/page.cfm?ArticleID=rwt051607033728&p=3

Indirect branch prediction was first introduced with Intel’s Prescott microarchitecture and later the Pentium M.

between 16-50% of all branch mispredicts were indirect (29% on average). The real value of indirect branch misprediction is for many of the newer scripting or high level languages, such as Ruby, Perl or Python, which use interpreters. Other common indirect branch common culprits include virtual functions (used in C++) and calls to function pointers.


http://www.realworldtech.com/page.cfm?ArticleID=RWT102808015436&p=5

AMD has adopted some of these refinements; for instance adding indirect branch predictor arrays in Barcelona and later processors. However, the K8 has older and less accurate branch predictors than the Core 2.


http://www.agner.org/optimize/microarchitecture.pdf

3.12 Indirect jumps on older processorsIndirect jumps, indirect calls, and returns may go to a different address each time. Theprediction method for an indirect jump or indirect call is, in processors older than PM andK10, simply to predict that it will go to the same target as last time it was executed.


和相同的 pdf,第 14 页

Indirect jump predictionAn indirect jump or call is a control transfer instruction that has more than two possibletargets. A C++ program can generate an indirect jump or call with... a virtual function. An indirect jump or call is generated in assembly byspecifying a register or a memory variable or an indexed array as the destination of a jumpor call instruction. Many processors make only one BTB entry for an indirect jump or call.This means that it will always be predicted to go to the same target as it did last time.As object oriented programming with polymorphous classes has become more common,there is a growing need for predicting indirect calls with multiple targets. This can be doneby assigning a new BTB entry for every new jump target that is encountered. The historybuffer and pattern history table must have space for more than one bit of information foreach jump incident in order to distinguish more than two possible targets.The PM is the first x86 processor to implement this method. The prediction rule on p. 12 stillapplies with the modification that the theoretical maximum period that can be predictedperfectly is mn, where m is the number of different targets per indirect jump, because thereare mn different possible n-length subsequences. However, this theoretical maximum cannotbe reached if it exceeds the size of the BTB or the pattern history table.


Agner 的手册对许多现代 CPU 中的分支预测器以及每个制造商 (x86/x86_64) 的 cpu 中预测器的演变进行了更长的描述。
还有很多理论上的“间接分支预测”方法(看谷歌学者);甚至 wiki 也说了一些话 http://en.wikipedia.org/wiki/Branch_predictor#Prediction_of_indirect_jumps/
对于来自agner's micro的Atoms:

Prediction of indirect branchesThe Atom has no pattern predictor for indirect branches according to my tests. Indirectbranches are predicted to go to the same target as last time.


因此,对于低功耗,间接分支预测并不是那么先进。 Via Nano 也是如此:

Indirect jumps are predicted to go to the same target as last time.


我认为,较短的低功耗 x86 管 Prop 有较低的惩罚,7-20 滴答。

关于x86 - CPU 架构演进如何影响虚函数调用性能?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7241922/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com