gpt4 book ai didi

performance - 为什么 cachegrind 忽略了 L3 缓存,这与文档相矛盾?

转载 作者:行者123 更新时间:2023-12-04 02:49:04 24 4
gpt4 key购买 nike

我想了解人们如何进行缓存优化, friend 向我推荐了 cachegrind作为实现这一目标的有用工具。

Valgrind 是一个 CPU 模拟器,假设有一个 2 级缓存,如前所述 here , 当使用 cachegrind 时

Cachegrind simulates how your program interacts with a machine's cache hierarchy and (optionally) branch predictor. It simulates a machine with independent first-level instruction and data caches (I1 and D1), backed by a unified second-level cache (L2). This exactly matches the configuration of many modern machines.

下一段继续为

However, some modern machines have three or four levels of cache. For these machines (in the cases where Cachegrind can auto-detect the cache configuration) Cachegrind simulates the first-level and last-level caches. The reason for this choice is that the last-level cache has the most influence on runtime, as it masks accesses to main memory.

然而,当我尝试在我的简单矩阵-矩阵乘法代码上运行 valgrind 时,我得到以下输出。

==6556== Cachegrind, a cache and branch-prediction profiler
==6556== Copyright (C) 2002-2010, and GNU GPL'd, by Nicholas Nethercote et al.
==6556== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info
==6556== Command: ./a.out
==6556==
--6556-- warning: L3 cache detected but ignored
==6556==
==6556== I refs: 50,986,869
==6556== I1 misses: 1,146
==6556== L2i misses: 1,137
==6556== I1 miss rate: 0.00%
==6556== L2i miss rate: 0.00%
==6556==
==6556== D refs: 20,232,408 (18,893,241 rd + 1,339,167 wr)
==6556== D1 misses: 150,194 ( 144,869 rd + 5,325 wr)
==6556== L2d misses: 10,451 ( 5,506 rd + 4,945 wr)
==6556== D1 miss rate: 0.7% ( 0.7% + 0.3% )
==6556== L2d miss rate: 0.0% ( 0.0% + 0.3% )
==6556==
==6556== L2 refs: 151,340 ( 146,015 rd + 5,325 wr)
==6556== L2 misses: 11,588 ( 6,643 rd + 4,945 wr)
==6556== L2 miss rate: 0.0% ( 0.0% + 0.3% )

根据文档,应该使用 L1 和 L3 缓存,但输出显示 L3 缓存被忽略。这是为什么?

此外,cachegrind 是否预先假定 L1 和最后一级缓存大小是多少,或者它是否使用当前运行的 CPU 的 L1 和最后一级缓存大小?

最佳答案

您在 cachegrind 似乎没有完全支持的英特尔 CPU 上运行。他们检查 cpuid 标志并根据针对不同处理器的大量案例语句确定支持。

这是来自代码的非官方副本,但只是说明性的 - https://github.com/koriakin/valgrind/blob/master/cachegrind/cg-x86-amd64.c :

/* Intel method is truly wretched.  We have to do an insane indexing into an
* array of pre-defined configurations for various parts of the memory
* hierarchy.
* According to Intel Processor Identification, App Note 485.
*/
static
Int Intel_cache_info(Int level, cache_t* I1c, cache_t* D1c, cache_t* L2c)
{
...
case 0x22: case 0x23: case 0x25: case 0x29:
case 0x46: case 0x47: case 0x4a: case 0x4b: case 0x4c: case 0x4d:
case 0xe2: case 0xe3: case 0xe4: case 0xea: case 0xeb: case 0xec:
VG_(dmsg)("warning: L3 cache detected but ignored\n");
break;

关于performance - 为什么 cachegrind 忽略了 L3 缓存,这与文档相矛盾?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20850023/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com