c - 使用cachegrind和callgrind的不同读写计数-6ren

c - 使用cachegrind和callgrind的不同读写计数

转载作者：太空狗更新时间：2023-10-29 16:36:36

25

4

我正在用 Cachegrind、Callgrind 和 Gem5 做一些实验。我注意到许多访问被计为 cachegrind 的读取，callgrind 的写入以及 gem5 的读取和写入。

让我们举一个非常简单的例子:

int main() {
    int i, l;

    for (i = 0; i < 1000; i++) {
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        l++;
        ... (100 times)
     }
 }

我编译:

gcc ex.c --static -o ex

所以基本上，根据 asm 文件，addl $1, -8(%rbp) 被执行了 100,000 次。因为它既是读又是写，我期待 100k 读和 100k 写。但是，cachegrind 仅将它们计为读取，而 callgrind 仅计为写入。

 % valgrind --tool=cachegrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15356== Cachegrind, a cache and branch-prediction profiler
==15356== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote et al.
==15356== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15356== Command: ./ex
==15356== 
--15356-- warning: L3 cache found, using its data for the LL simulation.
==15356== 
==15356== I   refs:      111,535
==15356== I1  misses:        475
==15356== LLi misses:        280
==15356== I1  miss rate:    0.42%
==15356== LLi miss rate:    0.25%
==15356== 
==15356== D   refs:      104,894  (103,791 rd   + 1,103 wr)
==15356== D1  misses:        557  (    414 rd   +   143 wr)
==15356== LLd misses:        172  (     89 rd   +    83 wr)
==15356== D1  miss rate:     0.5% (    0.3%     +  12.9%  )
==15356== LLd miss rate:     0.1% (    0.0%     +   7.5%  )
==15356== 
==15356== LL refs:         1,032  (    889 rd   +   143 wr)
==15356== LL misses:         452  (    369 rd   +    83 wr)
==15356== LL miss rate:      0.2% (    0.1%     +   7.5%  )

-

 % valgrind --tool=callgrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15376== Callgrind, a call-graph generating cache profiler
==15376== Copyright (C) 2002-2012, and GNU GPL'd, by Josef Weidendorfer et al.
==15376== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15376== Command: ./ex
==15376== 
--15376-- warning: L3 cache found, using its data for the LL simulation.
==15376== For interactive control, run 'callgrind_control -h'.
==15376== 
==15376== Events    : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
==15376== Collected : 111532 2777 102117 474 406 151 279 87 85
==15376== 
==15376== I   refs:      111,532
==15376== I1  misses:        474
==15376== LLi misses:        279
==15376== I1  miss rate:    0.42%
==15376== LLi miss rate:    0.25%
==15376== 
==15376== D   refs:      104,894  (2,777 rd + 102,117 wr)
==15376== D1  misses:        557  (  406 rd +     151 wr)
==15376== LLd misses:        172  (   87 rd +      85 wr)
==15376== D1  miss rate:     0.5% ( 14.6%   +     0.1%  )
==15376== LLd miss rate:     0.1% (  3.1%   +     0.0%  )
==15376== 
==15376== LL refs:         1,031  (  880 rd +     151 wr)
==15376== LL misses:         451  (  366 rd +      85 wr)
==15376== LL miss rate:      0.2% (  0.3%   +     0.0%  )

有人能给我一个合理的解释吗？我认为实际上有 ~100k 次读取和 ~100k 次写入(即 addl 的 2 次缓存访问)是否正确？

最佳答案

From cachegrind manual: 5.7.1. Cache Simulation Specifics

Instructions that modify a memory location (e.g. inc and dec) are counted as doing just a read, i.e. a single data reference. This may seem strange, but since the write can never cause a miss (the read guarantees the block is in the cache) it's not very interesting.

Thus it measures not the number of times the data cache is accessed, but the number of times a data cache miss could occur.

看来callgrind的缓存模拟逻辑和cachegrind不一样。我认为 callgrind 应该产生与 cachegrind 相同的结果，所以这可能是一个错误？

关于c - 使用cachegrind和callgrind的不同读写计数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15790541/

25

4

0

文章推荐： android - 两个 View 都可以在Android中获得焦点吗

文章推荐： android - Lollipop 中的 WebView 不会从 Assets 中加载字体

文章推荐： android - 以编程方式设置 LayoutParams.BELOW

文章推荐： c - 哪些替代品可用于 C 标准未定义的以前广泛支持的行为

c++ - Cachegrind:为什么有这么多缓存未命中？
我目前正在学习 Linux 下的各种分析和性能实用程序，尤其是 valgrind/cachegrind。我有以下玩具程序: #include #include int main() {
daemon - 如何使用 cachegrind 输出来优化应用程序
我需要提高系统的吞吐量。通常的优化周期已经完成，我们已经实现了 1.5 倍的吞吐量。我现在开始怀疑是否可以利用 cachegrind 输出来提高系统的吞吐量。有人可以指出我如何开始吗？我的理解
valgrind - 为什么 Cachegrind 不是完全确定性的？
Inspired by SQLite ，我正在考虑使用 valgrind 的“cachegrind”工具来进行可重复的性能基准测试。它输出的数字比我发现的任何其他计时方法都要稳定得多，但它们仍然不是确
php - 用于汇总分析 cachegrind 文件的工具？
是否有任何工具可以分析一个充满 xdebug 转储的文件夹？当我们在生产服务器上短时间启用 Xdebug 分析时，我们总是会得到数百个文件，这需要花费大量时间在 WinCacheGrind 或 KC
Cachegrind 的 cg_annotate 不显示我的源代码的注释
我的源代码: $ cat play.c int main() { return 0; } 在其上构建并运行 cachegrind: $ gcc -g -Wall play.c -o play &&
c++ - 您如何解释缓存未命中的 cachegrind 输出？
出于好奇，我编写了几个不同版本的矩阵乘法并针对它运行了 cachegrind。在下面的结果中，我想知道哪些部分是 L1、L2、L3 未命中和引用，它们的真正含义是什么？下面是我的矩阵乘法代码，以防万一
c - Valgrind 的 cachegrind 是否受多线程代码的影响？
如果我运行 Valgrind cachegrind，一个或多个线程的结果会不同吗？还是 Valgrind 只是将程序顺序化，只报告一个线程的工作？最佳答案 Valgrind 以不确定的方式序列化所有
mysql - Xdebug/CacheGrind 但对于 MySQL？
是否有某种方法可以检查应用程序中的哪些查询速度较慢，并且可以使用某种工具进行优化，尽可能不引人注意地允许您单击 Web 应用程序并跟踪其中查询的性能？我正在使用带有准备好的语句的 PHP PDO 来
c++ - gprof 与 cachegrind 配置文件
在尝试优化代码时，我对 kcachegrdind 和 gprof 生成的配置文件的差异感到有点困惑。具体来说，如果我使用 gprof(使用 -pg 开关编译等)，我有这个: Flat profile:
用于 cachegrind 文件的 PHP 解析器？
有这样的事情吗？我在 Windows 上使用 WinCacheGrind 应用程序，它似乎不适用于大文件 (~2 MB)。我收到类似“找不到调用目标”的错误。您知道用 PHP 编写的此类文件的任何
php - Xdebug Profiler 为自动添加的文件而不是目标文件创建一个 cachegrind.out 文件
Looking to profile my web app, I have added the following settings to my Applications php.ini file:
c++ - 我能否将 "force"Cachegrind 分析成一个操作(或行)？
我正在对两种搜索算法的缓存行为进行基准测试，这两种搜索算法使用 Cachegrind 对已排序的项目范围进行操作。我在一个 vector 中有 n 个项目，另一个 vector 包含所有有效索引。我在
php - 使用 Xdebug 配置 cachegrind 时遇到问题
我正在尝试为 cachegrind 配置 Xdebug，但我无法启用探查器功能以转储已执行的网页。我正在使用 the official guide (还有一些具有类似设置的)并且它似乎不起作用。我
php - 在 WinCacheGrind 中打开 CacheGrind 文件时缺少函数名称
见鬼去吧。我在 apache/php 服务器上安装了 XDEBUG 作为 ZEND 模块。它输出分析文件，但输出不包含函数名称，仅包含数字。我看起来不像其他 cachegrind 输出文件。我的文件
performance - 为什么 cachegrind 忽略了 L3 缓存，这与文档相矛盾？
我想了解人们如何进行缓存优化， friend 向我推荐了 cachegrind作为实现这一目标的有用工具。 Valgrind 是一个 CPU 模拟器，假设有一个 2 级缓存，如前所述 here , 当
debian - 如何限制 xdebug-profiler 创建的 cachegrind 文件
有什么方法可以限制 cachegrind 文件(xdebug 分析输出)？我想启用 xdebug.profile 来调试整个项目(不仅仅是触发器)，但如果有人忘记禁用它，我不希望光盘已满。我在 t

首页

博学

6Ren·AI

商城

c - 使用cachegrind和callgrind的不同读写计数