gpt4 book ai didi

x86 - Haswell、Sandy Bridge、Ivy Bridge 和 Skylake 的 BTB 大小?

转载 作者:行者123 更新时间:2023-12-04 16:10:04 27 4
gpt4 key购买 nike

是否有任何方法可以确定或任何资源可以找到 Haswell、Sandy Bridge、Ivy Bridge 和 Skylake Intel 处理器的分支目标缓冲区大小?

最佳答案

检查 Agner Fog 的软件优化资源,http://www.agner.org/optimize/

BTB 应该在“英特尔、AMD 和威盛 CPU 的微体系结构:汇编程序员和编译器制造商的优化指南”中,http://www.agner.org/optimize/microarchitecture.pdf

3.7 Branch prediction in Intel Sandy Bridge and Ivy Bridge

BTB organization. The branch target buffer in Sandy Bridge is bigger than in Nehalem according to unofficial rumors. It is unknown whether it has one level, as in Core 2 and earlier processors, or two levels as in Nehalem. It can handle a maximum of four call instructions per 16 bytes of code. Conditional jumps are less efficient if there are more than 3 branch instructions per 16 bytes of code.

3.8 Branch prediction in Intel Haswell, Broadwell and Skylake

BTB organization. The organization of the branch target buffer is unknown. It appears to be reasonably big.



Intel 可能在“Intel 64 and IA-32 Architectures Optimization Reference Manual”中描述了一些数据 http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html围绕“3.4.1 分支预测优化”但仍然没有大小。

看起来可能很奇怪,但是在 1998-2000 的 cpuid 中没有关于 BTB 的信息: http://www.installaware.com/forums/oldattachments/02142006163/tstcpuid.c (由德国图宾根大学 Gerald J. Heim 撰写。)。并且还没有在 http://www.felixcloutier.com/x86/CPUID.html中列出或者在英特尔员工的一些公开 Material 中......

 * This table describes the possible cache and TLB configurations
* as documented by Intel. For now AMD doesn't use this but gives
* exact cache layout data on CPUID 0x8000000x.
*
* MAX_CACHE_FEATURES_ITERATIONS limits the possible cache information
* to 80 bytes (of which 16 bytes are used in generic Pentii2).
* With 80 possible caches we are on the safe side for one or two years.
*
* Strange enough no BHT, BTB or return stack data is given this way...


应该有一些 BTB 的性能监控单元 (PMU) 计数器,并且有通过运行特殊测试程序来获得 BTB 大小的实验,检查 http://xania.org/201602/haswell-and-ivy-btb作者:马特·戈德博特

Conclusions

From these results, it seems Ivy Bridge (and therefore probably Sandy Bridge) uses pretty much the same strategy for BTB lookups of unconditional branches, albeit with a larger table size: 4096 entries split over 1024 sets of 4 ways.

For Haswell it seems a new approach for determining sets has been taken, along with a new approach to evicting entries.



以及他关于分支预测及其事件的更多帖子:
  • http://xania.org/201602/bpu-part-one更新的英特尔处理器上的静态分支预测
  • http://xania.org/201602/bpu-part-two分支预测 - 第二部分
  • http://xania.org/201602/bpu-part-three当代英特尔芯片中的 BTB)
  • http://xania.org/201602/bpu-part-four分支目标缓冲区,第 2 部分

  • 他的代码是公开的,基于 Agner 的测试: https://github.com/mattgodbolt/agner : https://github.com/mattgodbolt/agner/blob/master/tests/btb_size.py , https://github.com/mattgodbolt/agner/blob/master/tests/branch.py

    关于x86 - Haswell、Sandy Bridge、Ivy Bridge 和 Skylake 的 BTB 大小?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38512886/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com