x86 - CPU 架构演进如何影响虚函数调用性能？-6ren

x86 - CPU 架构演进如何影响虚函数调用性能？

转载作者：行者123 更新时间：2023-12-03 09:11:38

26

4

几年前，我正在学习 x86 汇编器、CPU 流水线、缓存未命中、分支预测以及所有这些爵士乐。

这是一个两半的故事。我阅读了处理器中冗长管道的所有美妙优势，即指令重新排序、缓存预加载、依赖交错等。

不利的一面是，任何偏离规范的行为都代价高昂。例如，IIRC 早期千兆赫时代的某个 AMD 处理器有一个 40个周期每次通过指针(!)调用函数时都会受到惩罚，这显然是正常的。

这不是一个可以忽略的“不用担心”的数字! 请记住，“好的设计”通常意味着“尽可能多地考虑你的功能”和“在数据类型中编码语义”，这通常意味着虚拟接口(interface)。

权衡是不执行此类操作的代码每个周期可能会获得两条以上的指令。这些是编写高性能 C++ 代码时需要担心的数字，这些代码重于对象设计而轻于数字运算。

据我了解，随着我们进入低功耗时代，长 CPU 流水线的趋势正在逆转。这是我的问题:

最新一代的 x86 兼容处理器是否仍然会因虚函数调用、错误的分支预测等而遭受巨大的惩罚？

最佳答案

AMD processor in the early-gigahertz era had a 40 cycle penalty every time you called a function

呵呵。。那么大。。
有一种“间接分支预测”方法，它有助于预测虚函数跳转，如果前段时间有相同的间接跳转。第一个和错误预测的 virt 仍然会受到惩罚。功能跳转。
支持从简单的“当且仅当前一个间接分支完全相同时预测正确”到非常复杂的两级数十或数百条目，其中检测单个间接 jmp 指令的 2-3 个目标地址的周期性交替。
这里有很多进化...
http://arstechnica.com/hardware/news/2006/04/core.ars/7

first introduced with the Pentium M: ... indirect branch predictor.

The indirect branch predictor

Because indirect branches load their branch targets from a register, instead of having them immediately available as is the case with direct branches, they're notoriously difficult to predict. Core's indirect branch predictor is a table that stores history information about the preferred target addresses of each indirect branch that the front end encounters. Thus when the front-end encounters an indirect branch and predicts it as taken, it can ask the indirect branch predictor to direct it to the address in the BTB that the branch will probably want.

http://www.realworldtech.com/page.cfm?ArticleID=rwt051607033728&p=3

Indirect branch prediction was first introduced with Intel’s Prescott microarchitecture and later the Pentium M.

between 16-50% of all branch mispredicts were indirect (29% on average). The real value of indirect branch misprediction is for many of the newer scripting or high level languages, such as Ruby, Perl or Python, which use interpreters. Other common indirect branch common culprits include virtual functions (used in C++) and calls to function pointers.

http://www.realworldtech.com/page.cfm?ArticleID=RWT102808015436&p=5

AMD has adopted some of these refinements; for instance adding indirect branch predictor arrays in Barcelona and later processors. However, the K8 has older and less accurate branch predictors than the Core 2.

http://www.agner.org/optimize/microarchitecture.pdf

3.12 Indirect jumps on older processorsIndirect jumps, indirect calls, and returns may go to a different address each time. Theprediction method for an indirect jump or indirect call is, in processors older than PM andK10, simply to predict that it will go to the same target as last time it was executed.

和相同的 pdf，第 14 页

Indirect jump predictionAn indirect jump or call is a control transfer instruction that has more than two possibletargets. A C++ program can generate an indirect jump or call with... a virtual function. An indirect jump or call is generated in assembly byspecifying a register or a memory variable or an indexed array as the destination of a jumpor call instruction. Many processors make only one BTB entry for an indirect jump or call.This means that it will always be predicted to go to the same target as it did last time.As object oriented programming with polymorphous classes has become more common,there is a growing need for predicting indirect calls with multiple targets. This can be doneby assigning a new BTB entry for every new jump target that is encountered. The historybuffer and pattern history table must have space for more than one bit of information foreach jump incident in order to distinguish more than two possible targets.The PM is the first x86 processor to implement this method. The prediction rule on p. 12 stillapplies with the modification that the theoretical maximum period that can be predictedperfectly is mn, where m is the number of different targets per indirect jump, because thereare mn different possible n-length subsequences. However, this theoretical maximum cannotbe reached if it exceeds the size of the BTB or the pattern history table.

Agner 的手册对许多现代 CPU 中的分支预测器以及每个制造商 (x86/x86_64) 的 cpu 中预测器的演变进行了更长的描述。
还有很多理论上的“间接分支预测”方法(看谷歌学者)；甚至 wiki 也说了一些话 http://en.wikipedia.org/wiki/Branch_predictor#Prediction_of_indirect_jumps/
对于来自agner's micro的Atoms:

Prediction of indirect branchesThe Atom has no pattern predictor for indirect branches according to my tests. Indirectbranches are predicted to go to the same target as last time.

因此，对于低功耗，间接分支预测并不是那么先进。 Via Nano 也是如此:

Indirect jumps are predicted to go to the same target as last time.

我认为，较短的低功耗 x86 管 Prop 有较低的惩罚，7-20 滴答。

关于x86 - CPU 架构演进如何影响虚函数调用性能？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/7241922/

26

4

0

文章推荐： Vim:在当前光标位置插入文件中的文本

文章推荐： complexity-theory - 指数时间复杂度的真实示例

文章推荐： opengl-es-2.0 - 为什么不为OpenGL ES 2.0 gl_Position使用vec3？

文章推荐： github - 如何知道谁是 GitHub 存储库的维护者？

SSL 重新协商失败 - 影响？
这是一个非常笼统的问题，我希望我能答对。我正在研究 SSL/TLS 重新协商并已阅读了一些内容。这是我从阅读中了解到的内容: 从 SSL/TLS 重新协商的角度来看，客户端分为两个主要组，打补丁的和
android标题受其他标题(ToolBar)影响
第一个屏幕是艺术的细节。当我向上滚动时，标题将是 alpha。我点击另一个“艺术”到另一个细节 UI，然后按回到 Previous UI。之前的UI标题是黑色的，怎么变透明了。布局:
Cassandra 物化 View 影响
想知道 mv 对基表的影响。它会减慢基表的速度吗？它什么时候开始写入 mv，就像同时写入基表和 mv 一样？如果我有 local_quorum 的 CL 且 RF=3，客户端是否必须等到写入 mv
javascript - 影响 JavaScript for 循环中的对象
似乎在任何地方都找不到太多关于此问题的帮助，所以我想我会在这里尝试。我正在尝试制作一个简单的 for 循环，当我将鼠标悬停在 html 卡上时，它会隐藏卡中的一些文本。该卡有一个简单的名字和姓氏，我
java - 如何让时间不受 FPS 影响？
我有一个程序每帧运行 tick() 方法。我希望一个对象根据设定的重力常数下落，因此我创建了一个 Ball 对象，该对象会将其位置更新为前一帧的位置减去 y 速度。每个刻度 y 速度都会减少重力常数。
java - showMessageDialog 影响 KeyHandler
我的 KeyHandler 在这里: private void KeyHandler(java.awt.event.KeyEvent evt) {
java - 影响 for 循环的出队方法 (Java)
我有一个方法，其中使用了很多其他类，包括链接列表、队列和堆栈。在我的方法中，我有一个 for 循环，我想在其中弹出堆栈(方便地命名为 s)并将队列(方便地命名为 q)出队到 s1 和 q1。由于某种原
java - JTree 影响 JLabel
我有一个 JTree 节点数组和另一个自定义对象的相应数组。我想要什么:当选择 JTree 的节点时，相应对象(其数组中索引与节点数组中所选节点索引相同的对象)的字段填充 JLabels。我被困在
javascript - 影响 Javascript 性能的因素
我知道浏览器完成了处理客户端脚本(Javascript、JQuery 等)的所有工作，但想知道在性能方面是否还有其他重要因素(网络速度、客户端计算机速度、服务器环境) 如果它完全依赖于浏览器(类型和版
java - SocketServer.Accept() 影响？
我有一个 Android 服务在后台运行，它将使用以下代码: while(true) { ServerSocket server = new ServerSocket(1234); Socke
javascript - 重复 ID 影响
对JQM有以下疑惑: 1.如果我们在单独的 html 文件中使用重复的 id，对 jquery mobile 有什么影响。假设我们在单独的 html 文件中有重复的 id，但如果我们不使用该 id
mysql - 影响 MySQL 中带有关键字的所有行的产品变体
我正在尝试更新两个(inventory、sold)MySQL 表的表库存。假设我们正在处理的 sku 是 BT888-16 UPDATE inventory JOIN sold ON invento
java - 影响 JTable 单元格值在文本文件上的更改
我使用这种方法来更改我的表格单元格值，它在 jtable 上改变但在文本文件上没有改变! public class user_AllBooks extends AbstractTableModel
mysql - 影响 2 个服务器的存储过程
我想在向表中插入数据时创建一个 MYSQL 存储过程，数据也会被插入到其他服务器表中。我知道这在 ORACLE 数据库中是可能的，但我不知道它是否适用于 MYSQL。有什么办法吗？最佳答案是的
html - 影响 CSS 样式的表单标签
我在 css 方面非常糟糕，只能靠 SO 答案来解决 - 但是我找不到针对这个特定问题的任何解释。我有一个表单，其中包含一个 textarea 和一个 button(input/submit)，仅此
javascript - 影响 sibling 位置的动画
我在一个元素上有动画，但它的移动也会影响 sibling 。如何在不影响兄弟元素的情况下仅在元素上使用动画？问题示例: function animateSearch() { $('.glyph
ios - 影响 UIView 阴影的约束？
我试图在我的 ViewController 中的 UIView 的所有四个边上建立一个阴影 — 在我通过 Xcode 向 UIView 添加约束之前，它工作得很好。我怎样才能使 UIView 的阴影显
javascript 和 DHTML 影响
自从我使用 JavaScript 以来已经有一段时间了 - 在获得证书之后我开始学习 Perl 并从那时起就一直使用它。我只是想重新开始使用 JS，我已经写了这个，我想说的是，这是一个简单的小脚本，可
html - 影响 :active 上的多个类
我正在处理一个 HTML 元素，我添加了一个复选框，选中后会高亮显示所有文本输入字段。唯一的问题是一些输入字段在表格内，出于某种原因我无法用我的代码影响它们。任何帮助将不胜感激。相关代码: HTML
swift - 改变字符串扩展导致巨大的 CPU 影响
我为 String 类创建了一个小扩展，以便方便地从中删除字符。这是它的样子: mutating func drop(characters chars: [String]) { for c i

首页

博学

6Ren·AI

商城

x86 - CPU 架构演进如何影响虚函数调用性能？