c++ - gprof 与 cachegrind 配置文件-6ren

c++ - gprof 与 cachegrind 配置文件

转载作者：可可西里更新时间：2023-11-01 15:05:00

24

4

在尝试优化代码时，我对 kcachegrdind 和 gprof 生成的配置文件的差异感到有点困惑。具体来说，如果我使用 gprof(使用 -pg 开关编译等)，我有这个:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 89.62      3.71     3.71   204626     0.02     0.02  objR<true>::R_impl(std::vector<coords_t, std::allocator<coords_t> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&) const
  5.56      3.94     0.23 18018180     0.00     0.00  W2(coords_t const&, coords_t const&)
  3.87      4.10     0.16   200202     0.00     0.00  build_matrix(std::vector<coords_t, std::allocator<coords_t> > const&)
  0.24      4.11     0.01   400406     0.00     0.00  std::vector<double, std::allocator<double> >::vector(std::vector<double, std::allocator<double> > const&)
  0.24      4.12     0.01   100000     0.00     0.00  Wrat(std::vector<coords_t, std::allocator<coords_t> > const&, std::vector<coords_t, std::allocator<coords_t> > const&)
  0.24      4.13     0.01        9     1.11     1.11  std::vector<short, std::allocator<short> >* std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<std::vector<short, std::alloca

这似乎表明除了 ::R_impl(...)

我不需要费心去寻找任何地方

与此同时，如果我在没有 -pg 开关的情况下进行编译并改为运行 valgrind --tool=callgrind ./a.out，我会得到一些相当不同的东西: 这是 kcachegrind 输出的屏幕截图

enter image description here

如果我正确地解释了这一点，它似乎表明 ::R_impl(...) 只需要大约 50% 的时间，而另一半花在线性代数上( Wrat(...)、eigenvalues 和底层的 lapack 调用)在 gprof 配置文件中的下方。

我知道 gprof 和 cachegrind 使用不同的技术，如果它们的结果有些不同，我不会介意。但在这里，它看起来非常不同，我不知道如何解释这些。有什么想法或建议吗？

最佳答案

您正在查看错误的列。您必须查看 kcachegrind 输出中的第二列，即名为“self”的列。这是特定子例程仅在不考虑其子例程时花费的时间。第一列是累计时间(它等于 100% 的机器时间)，它的信息量不大(在我看来)。

请注意，从 kcachegrind 的输出中，您可以看到该进程的总时间为 53.64 秒，而子例程“R_impl”中花费的时间为 46.72 秒，占总时间的 87%。所以 gprof 和 kcachegrind 几乎完全一致。

关于c++ - gprof 与 cachegrind 配置文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6316697/

24

4

0

文章推荐： http - gradle - 从 url 下载并解压文件

文章推荐： java - 将 hadoop 的 Configuration 转换为 Map

文章推荐： http - 使用 http 压缩时的内容长度

文章推荐： c# - Pig Latin 控制台

c++ - Cachegrind:为什么有这么多缓存未命中？
我目前正在学习 Linux 下的各种分析和性能实用程序，尤其是 valgrind/cachegrind。我有以下玩具程序: #include #include int main() {
daemon - 如何使用 cachegrind 输出来优化应用程序
我需要提高系统的吞吐量。通常的优化周期已经完成，我们已经实现了 1.5 倍的吞吐量。我现在开始怀疑是否可以利用 cachegrind 输出来提高系统的吞吐量。有人可以指出我如何开始吗？我的理解
valgrind - 为什么 Cachegrind 不是完全确定性的？
Inspired by SQLite ，我正在考虑使用 valgrind 的“cachegrind”工具来进行可重复的性能基准测试。它输出的数字比我发现的任何其他计时方法都要稳定得多，但它们仍然不是确
php - 用于汇总分析 cachegrind 文件的工具？
是否有任何工具可以分析一个充满 xdebug 转储的文件夹？当我们在生产服务器上短时间启用 Xdebug 分析时，我们总是会得到数百个文件，这需要花费大量时间在 WinCacheGrind 或 KC
Cachegrind 的 cg_annotate 不显示我的源代码的注释
我的源代码: $ cat play.c int main() { return 0; } 在其上构建并运行 cachegrind: $ gcc -g -Wall play.c -o play &&
c++ - 您如何解释缓存未命中的 cachegrind 输出？
出于好奇，我编写了几个不同版本的矩阵乘法并针对它运行了 cachegrind。在下面的结果中，我想知道哪些部分是 L1、L2、L3 未命中和引用，它们的真正含义是什么？下面是我的矩阵乘法代码，以防万一
c - Valgrind 的 cachegrind 是否受多线程代码的影响？
如果我运行 Valgrind cachegrind，一个或多个线程的结果会不同吗？还是 Valgrind 只是将程序顺序化，只报告一个线程的工作？最佳答案 Valgrind 以不确定的方式序列化所有
mysql - Xdebug/CacheGrind 但对于 MySQL？
是否有某种方法可以检查应用程序中的哪些查询速度较慢，并且可以使用某种工具进行优化，尽可能不引人注意地允许您单击 Web 应用程序并跟踪其中查询的性能？我正在使用带有准备好的语句的 PHP PDO 来
c++ - gprof 与 cachegrind 配置文件
在尝试优化代码时，我对 kcachegrdind 和 gprof 生成的配置文件的差异感到有点困惑。具体来说，如果我使用 gprof(使用 -pg 开关编译等)，我有这个: Flat profile:
用于 cachegrind 文件的 PHP 解析器？
有这样的事情吗？我在 Windows 上使用 WinCacheGrind 应用程序，它似乎不适用于大文件 (~2 MB)。我收到类似“找不到调用目标”的错误。您知道用 PHP 编写的此类文件的任何
php - Xdebug Profiler 为自动添加的文件而不是目标文件创建一个 cachegrind.out 文件
Looking to profile my web app, I have added the following settings to my Applications php.ini file:
c++ - 我能否将 "force"Cachegrind 分析成一个操作(或行)？
我正在对两种搜索算法的缓存行为进行基准测试，这两种搜索算法使用 Cachegrind 对已排序的项目范围进行操作。我在一个 vector 中有 n 个项目，另一个 vector 包含所有有效索引。我在
php - 使用 Xdebug 配置 cachegrind 时遇到问题
我正在尝试为 cachegrind 配置 Xdebug，但我无法启用探查器功能以转储已执行的网页。我正在使用 the official guide (还有一些具有类似设置的)并且它似乎不起作用。我
php - 在 WinCacheGrind 中打开 CacheGrind 文件时缺少函数名称
见鬼去吧。我在 apache/php 服务器上安装了 XDEBUG 作为 ZEND 模块。它输出分析文件，但输出不包含函数名称，仅包含数字。我看起来不像其他 cachegrind 输出文件。我的文件
performance - 为什么 cachegrind 忽略了 L3 缓存，这与文档相矛盾？
我想了解人们如何进行缓存优化， friend 向我推荐了 cachegrind作为实现这一目标的有用工具。 Valgrind 是一个 CPU 模拟器，假设有一个 2 级缓存，如前所述 here , 当
debian - 如何限制 xdebug-profiler 创建的 cachegrind 文件
有什么方法可以限制 cachegrind 文件(xdebug 分析输出)？我想启用 xdebug.profile 来调试整个项目(不仅仅是触发器)，但如果有人忘记禁用它，我不希望光盘已满。我在 t

首页

博学

6Ren·AI

商城

c++ - gprof 与 cachegrind 配置文件