performance - 为什么 cachegrind 忽略了 L3 缓存，这与文档相矛盾？-6ren

performance - 为什么 cachegrind 忽略了 L3 缓存，这与文档相矛盾？

转载作者：行者123 更新时间：2023-12-04 02:49:04

24

4

我想了解人们如何进行缓存优化， friend 向我推荐了 cachegrind作为实现这一目标的有用工具。

Valgrind 是一个 CPU 模拟器，假设有一个 2 级缓存，如前所述 here , 当使用 cachegrind 时

Cachegrind simulates how your program interacts with a machine's cache hierarchy and (optionally) branch predictor. It simulates a machine with independent first-level instruction and data caches (I1 and D1), backed by a unified second-level cache (L2). This exactly matches the configuration of many modern machines.

下一段继续为

However, some modern machines have three or four levels of cache. For these machines (in the cases where Cachegrind can auto-detect the cache configuration) Cachegrind simulates the first-level and last-level caches. The reason for this choice is that the last-level cache has the most influence on runtime, as it masks accesses to main memory.

然而，当我尝试在我的简单矩阵-矩阵乘法代码上运行 valgrind 时，我得到以下输出。

==6556== Cachegrind, a cache and branch-prediction profiler
==6556== Copyright (C) 2002-2010, and GNU GPL'd, by Nicholas Nethercote et al.
==6556== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==6556== Command: ./a.out
==6556== 
--6556-- warning: L3 cache detected but ignored
==6556== 
==6556== I   refs:      50,986,869
==6556== I1  misses:         1,146
==6556== L2i misses:         1,137
==6556== I1  miss rate:       0.00%
==6556== L2i miss rate:       0.00%
==6556== 
==6556== D   refs:      20,232,408  (18,893,241 rd   + 1,339,167 wr)
==6556== D1  misses:       150,194  (   144,869 rd   +     5,325 wr)
==6556== L2d misses:        10,451  (     5,506 rd   +     4,945 wr)
==6556== D1  miss rate:        0.7% (       0.7%     +       0.3%  )
==6556== L2d miss rate:        0.0% (       0.0%     +       0.3%  )
==6556== 
==6556== L2 refs:          151,340  (   146,015 rd   +     5,325 wr)
==6556== L2 misses:         11,588  (     6,643 rd   +     4,945 wr)
==6556== L2 miss rate:         0.0% (       0.0%     +       0.3%  )

根据文档，应该使用 L1 和 L3 缓存，但输出显示 L3 缓存被忽略。这是为什么？

此外，cachegrind 是否预先假定 L1 和最后一级缓存大小是多少，或者它是否使用当前运行的 CPU 的 L1 和最后一级缓存大小？

最佳答案

您在 cachegrind 似乎没有完全支持的英特尔 CPU 上运行。他们检查 cpuid 标志并根据针对不同处理器的大量案例语句确定支持。

这是来自代码的非官方副本，但只是说明性的 - https://github.com/koriakin/valgrind/blob/master/cachegrind/cg-x86-amd64.c :

/* Intel method is truly wretched.  We have to do an insane indexing into an
 * array of pre-defined configurations for various parts of the memory
 * hierarchy.
 * According to Intel Processor Identification, App Note 485.
 */
static
Int Intel_cache_info(Int level, cache_t* I1c, cache_t* D1c, cache_t* L2c)
{
...
      case 0x22: case 0x23: case 0x25: case 0x29:
      case 0x46: case 0x47: case 0x4a: case 0x4b: case 0x4c: case 0x4d:
      case 0xe2: case 0xe3: case 0xe4: case 0xea: case 0xeb: case 0xec:
          VG_(dmsg)("warning: L3 cache detected but ignored\n");
          break;

关于performance - 为什么 cachegrind 忽略了 L3 缓存，这与文档相矛盾？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20850023/

24

4

0

文章推荐： asp-classic - 剩余的过程调用或参数无效

文章推荐： oop - 没有 setter/getter 的正确 OOP 设计？

文章推荐： .net - 展开或折叠时如何更改 WPF 扩展器标题文本？

假设中的 Coq 矛盾
在 Coq 中，我有两个假设 H 和 H0 ，它们相互矛盾。问题是，它们只是在某些特化方面相互矛盾，而在证明的这一刻，上下文并不是那么特化。此时我的证明上下文如下所示: color : Vertex
ruby - Ruby 中的模块和类。矛盾？
根据 RubyMonk section 8.1模块只保存行为而不保存状态，类可以保存行为和状态。然而，模块是 Ruby 中类的父类(super class)。怎么会这样？最佳答案哦兄弟，如果你忘
javascript - ReactJS:PureRenderMixin 插件脚注 - 矛盾？
来自此处的文档:http://facebook.github.io/react/docs/pure-render-mixin.html 脚注说如果复杂数据(深层数据结构)的结构发生变化，你应该使用fo
javascript - Chrome console.log 矛盾
我有一个简单的类(class) function TrueNinja() { this.vanish = function() { return this; }; } 由此创建一个新对象 var
python - 空 numpy 数组 bool 矛盾
这个问题在这里已经有了答案: How do Python's any and all functions work? (10 个答案) 关闭 4 年前。无意中发现了Numpy中的一些东西，实在看不
c++ - 获取动态 C 样式数组的大小与使用 delete[]。矛盾？
这个问题在这里已经有了答案: C++ doesn't tell you the size of a dynamic array. But why? (7 个回答) 关闭3年前。我到处都读到，在 C+
r - eval(expr，envir，enclos)中的错误-矛盾？
编辑以提供完整的代码示例和特定问题我正在编写一个函数来生成股票价格的时间序列图。但是，出现以下错误 eval(expr，envir，enclos)中的错误:找不到对象'df1234' 这是该函数的示
c - Strcmp 与 C 中的 Printf 矛盾
已关闭。此问题需要 debugging details 。目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and the
c++ - Stroustrup 的 RAII 和强制转换运算符 FILE*() = 矛盾？
我正在阅读 Stroustrup 的 C++(1997 年第 3 版)以了解他是如何实现 RAII 的，在第 365 页上我发现了这一点: class File_ptr{ FILE* p; p
c++ - [class.prop]/(3.7) 似乎与 [class.prop]/(3.7.3) 矛盾。我错过了什么？
A class S is a standard-layout class if it: [class.prop]/(3.7) : has no element of the set M(S) of t

首页

博学

6Ren·AI

商城

performance - 为什么 cachegrind 忽略了 L3 缓存，这与文档相矛盾？