gpt4 book ai didi

当程序从终端运行时,clock_gettime 需要更长的时间来执行

转载 作者:行者123 更新时间:2023-12-03 09:57:50 24 4
gpt4 key购买 nike

我试图测量一段代码的时间,并注意到当我从我的编辑器 QtCreator 中运行程序时,与我从 gnome 终端中启动的 bash shell 运行它时相比,时间快了大约 50ns。我使用 Ubuntu 20.04 作为操作系统。
一个小程序来重现我的问题:

#include <stdio.h>
#include <time.h>

struct timespec now() {
struct timespec now;
clock_gettime(CLOCK_MONOTONIC, &now);
return now;
}

long interval_ns(struct timespec tick, struct timespec tock) {
return (tock.tv_sec - tick.tv_sec) * 1000000000L
+ (tock.tv_nsec - tick.tv_nsec);
}

int main() {
// sleep(1);
for (size_t i = 0; i < 10; i++) {
struct timespec tick = now();
struct timespec tock = now();
long elapsed = interval_ns(tick, tock);
printf("It took %lu ns\n", elapsed);
}
return 0;
}
从 QtCreator 中运行时的输出
It took 84 ns
It took 20 ns
It took 20 ns
It took 21 ns
It took 21 ns
It took 21 ns
It took 22 ns
It took 21 ns
It took 20 ns
It took 21 ns
当在终端内从我的 shell 运行时:
$ ./foo 
It took 407 ns
It took 136 ns
It took 74 ns
It took 73 ns
It took 77 ns
It took 79 ns
It took 74 ns
It took 81 ns
It took 74 ns
It took 78 ns
我尝试过的没有任何区别的事情
  • 让 QtCreator 在终端中启动程序
  • 使用 rdtsc 和 rdtscp 调用代替clock_gettime(运行时的相对差异相同)
  • 通过在 env -i 下运行来清除终端中的环境
  • 使用 sh 而不是 bash 启动程序

  • 我已经验证在所有情况下都调用了相同的二进制文件。
    我已经验证在所有情况下程序的 nice 值都是 0。
    问题
    为什么从我的 shell 启动程序会有所不同?关于尝试什么的任何建议?
    更新
  • 如果我在 main 的开头添加 sleep(1) 调用,QtCreator 和 gnome-terminal/bash 调用都会报告更长的执行时间。
  • 如果我在 main 的开头添加一个 system("ps -H") 调用,但删除前面提到的 sleep(1):两个调用都报告短执行时间(~20 ns)。
  • 最佳答案

    只需添加更多迭代,让 CPU 有时间提升到最大时钟速度。 您的“慢”时间是 CPU 处于低功耗空闲时钟速度。
    QtCreator 显然在程序运行之前使用了足够的 CPU 时间来实现这一点,否则您正在编译 + 运行并且编译过程用作热身。 (与 bash 的 fork/execve 相比,重量更轻。)
    Idiomatic way of performance evaluation?有关在基准测试时进行热身运行的更多信息,以及 Why does this delay-loop start to run faster after several iterations with no sleep?
    在运行 Linux 的 i7-6700k (Skylake) 上,将循环迭代计数增加到 1000 足以使最终迭代以全时钟速度运行,即使在前几次迭代处理页面错误、预热 iTLB、uop 缓存、数据之后缓存等等。

    $ ./a.out      
    It took 244 ns
    It took 150 ns
    It took 73 ns
    It took 76 ns
    It took 75 ns
    It took 71 ns
    It took 72 ns
    It took 72 ns
    It took 69 ns
    It took 75 ns
    ...
    It took 74 ns
    It took 68 ns
    It took 69 ns
    It took 72 ns
    It took 72 ns # 382 "slow" iterations in this test run (copy/paste into wc to check)
    It took 15 ns
    It took 15 ns
    It took 15 ns
    It took 15 ns
    It took 16 ns
    It took 16 ns
    It took 15 ns
    It took 15 ns
    It took 15 ns
    It took 15 ns
    It took 14 ns
    It took 16 ns
    ...
    在我的系统上,energy_performance_preference 设置为 balance_performance ,所以硬件 P 状态调控器不像 performance 那样激进。 .使用 grep . /sys/devices/system/cpu/cpufreq/policy[0-9]*/energy_performance_preference要检查,请使用 sudo改变它:
    sudo sh -c 'for i in /sys/devices/system/cpu/cpufreq/policy[0-9]*/energy_performance_preference;do echo balance_performance > "$i";done'
    甚至在 perf stat ./a.out 下运行它不过,足以快速提升到最大时钟速度;这真的不需要太多。但是 bash按下回车后的命令解析非常便宜,在调用 execve 之前没有做太多 CPU 工作并到达 main在您的新流程中。 printf带有行缓冲输出是程序中占用大部分 CPU 时间的原因,顺便说一句。这就是为什么需要很少的迭代来加速的原因。例如如果你运行 perf stat --all-user -r10 ./a.out ,你会看到每秒用户空间内核时钟周期只有 0.4GHz,其余时间花在内核中 write系统调用。

    关于当程序从终端运行时,clock_gettime 需要更长的时间来执行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63236025/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com