linux - 为什么clock_gettime如此不稳定？-6ren

linux - 为什么clock_gettime如此不稳定？

转载作者：行者123 更新时间：2023-12-01 19:07:29

介绍

部分“旧问题”包含初始问题(此后已添加了“进一步调查和结论”)。

跳到下面的进一步研究部分，详细比较不同的计时方法(rdtsc，clock_gettime和QueryThreadCycleTime)。

我相信CGT的不稳定行为可以归因于错误的内核或错误的CPU(请参见结论部分)。

用于测试的代码位于该问题的底部(请参阅附录部分)。

很抱歉的长度。

旧问题

简而言之:我正在使用 clock_gettime来衡量许多代码段的执行时间。我在独立运行之间的测量结果非常不一致。与其他方法相比，该方法具有极高的标准偏差(请参阅下面的说明)。

问题:与其他方法相比， clock_gettime是否给出如此不一致的测量值是有原因的吗？是否存在具有相同分辨率的替代方法来解决线程空闲时间？

说明:我正在尝试分析C代码的一小部分。每个代码段的执行时间不超过几微秒。在一次运行中，每个代码段将执行数百次，从而产生 runs × hundreds的测量值。

我还必须只测量线程实际花费在执行上的时间(这就是 rdtsc不适合的原因)。我还需要高分辨率(这就是 times不适合的原因)。

我尝试了以下方法:

rdtsc(在Linux和Windows上)，

clock_gettime(在Linux上为“CLOCK_THREAD_CPUTIME_ID”；)和

QueryThreadCycleTime(在Windows上)。

方法论:分析进行了25次运行。在每次运行中，单独的代码段重复101次。因此，我有2525个测量值。然后，我查看测量值的直方图，并计算一些基本信息(例如均值，std.dev。，中位数，众数，最小值和最大值)。

我没有介绍如何测量这三种方法的“相似性”，但这只是涉及每个代码段中所用时间比例的基本比较(“比例”表示时间已标准化)。然后，我看看这些比例的纯粹差异。该比较表明，在25次运行中取平均值时，所有“rdtsc”，“QTCT”和“CGT”的比例均相同。但是，以下结果表明“CGT”具有非常大的标准偏差。这使得它在我的用例中无法使用。

结果:

相同代码段的 clock_gettime和 rdtsc的比较(25次运行的101次测量= 2525个读数):

clock_gettime :

1881年测量的11 ns，

595的测量值(几乎正常分布)在3369到3414 ns之间，

2次测量为11680 ns，

1测量为1506022 ns，以及

其余时间在900到5000 ns之间。

分钟:11 ns

最大值:1506022 ns

平均值:1471.862 ns

中位数:11 ns

模式:11 ns

Stddev:29991.034

rdtsc (注意:在此运行期间未发生上下文切换，但是如果发生此切换，通常只会进行30000个滴答左右的单次测量):

在274和325滴答之间进行1178次测量

上

306的测量值介于326和375之间

在376和425滴答之间进行910次测量

在426和990滴答之间进行129次测量，

1测量1240个滴答，

1个测量的1256个刻度。

敏:274个滴答声

最大值:1256个滴答声

均值:355.806滴答声

中位数:333 ticks

模式:376个滴答声

标准差:83.896

讨论:

rdtsc在Linux和Windows上均提供非常相似的结果。它具有可接受的标准偏差-实际上相当一致/稳定。但是，它不考虑线程空闲时间。因此，上下文切换使测量变得不稳定(在Windows上我经常观察到这一点:平均大约1000个滴答声的代码段有时会不时地消耗〜30000个滴答声-肯定是由于抢占而来)。

QueryThreadCycleTime给出非常一致的测量值-即与rdtsc相比，标准偏差要低得多。当没有上下文切换发生时，此方法几乎与rdtsc相同。

另一方面，

clock_gettime产生极其不一致的结果(不仅在运行之间，而且在测量之间)。标准偏差非常大(与rdtsc相比)。

我希望统计数字可以。但是，两种方法之间的测量值出现如此差异的原因可能是什么？当然，有缓存，CPU/内核迁移等。但是，这一切都不应对“rdtsc”和“clock_gettime”之间的任何此类差异负责。到底是怎么回事？

进一步的调查

我对此进行了进一步调查。我做了两件事:

测量仅调用clock_gettime(CLOCK_THREAD_CPUTIME_ID, &t)(请参阅附录中的代码1)和

的开销
在称为 clock_gettime的简单循环中将

并将读数存储到数组中(请参阅附录中的代码2)。我测量了增量时间(相继的测量时间中的差异，应该与clock_gettime调用的开销相对应)。

我已经在具有两个不同Linux Kernel版本的两台不同计算机上对其进行了测量:

CGT :

CPU :Core 2 Duo L9400 @ 1.86GHz

内核:Linux 2.6.40-4.fc15.i686＃1 SMP Fri Jul 29 18:54:39 UTC 2011 i686 i686 i386

结果:

估计的clock_gettime开销:690-710 ns之间

Delta时间:

平均:815.22 ns

中位数:713 ns

模式:709 ns

分钟:698 ns

最大值:23359 ns

直方图(遗漏范围的频率为0):

      Range       |  Frequency
------------------+-----------
  697 < x ≤ 800   ->     78111  <-- cached?
  800 < x ≤ 1000  ->     16412
 1000 < x ≤ 1500  ->         3
 1500 < x ≤ 2000  ->      4836  <-- uncached?
 2000 < x ≤ 3000  ->       305
 3000 < x ≤ 5000  ->       161
 5000 < x ≤ 10000 ->       105
10000 < x ≤ 15000 ->        53
15000 < x ≤ 20000 ->         8
20000 < x         ->         5

CPU :4×双核AMD Opteron处理器275

内核:Linux 2.6.26-2-amd64＃1 SMP Sun Jun 20 20:16:30 UTC 2010 x86_64 GNU/Linux

结果:

估计的clock_gettime开销:279-283 ns之间

Delta时间:

平均:320.00

中位数:1

模式:1

最小值:1

最大值:3495529

直方图(遗漏范围的频率为0):

      Range         |  Frequency
--------------------+-----------
          x ≤ 1     ->     86738  <-- cached?
    282 < x ≤ 300   ->     13118  <-- uncached?
    300 < x ≤ 440   ->        78
   2000 < x ≤ 5000  ->        52
   5000 < x ≤ 30000 ->         5
3000000 < x         ->         8

RDTSC :

相关代码 rdtsc_delta.c和 rdtsc_overhead.c。

CPU :Core 2 Duo L9400 @ 1.86GHz

内核:Linux 2.6.40-4.fc15.i686＃1 SMP Fri Jul 29 18:54:39 UTC 2011 i686 i686 i386

结果:

估计的开销:39-42滴答之间

Delta时间:

平均:52.46滴答声

中位数:42个滴答声

模式:42个滴答声

最低:35 ticks

最多:28700个滴答声

直方图(遗漏范围的频率为0):

      Range       |  Frequency
------------------+-----------
   34 < x ≤ 35    ->     16240  <-- cached?
   41 < x ≤ 42    ->     63585  <-- uncached? (small difference)
   48 < x ≤ 49    ->     19779  <-- uncached?
   49 < x ≤ 120   ->       195
 3125 < x ≤ 5000  ->       144
 5000 < x ≤ 10000 ->        45
10000 < x ≤ 20000 ->         9
20000 < x         ->         2

CPU :4×双核AMD Opteron处理器275

内核:Linux 2.6.26-2-amd64＃1 SMP Sun Jun 20 20:16:30 UTC 2010 x86_64 GNU/Linux

结果:

估计的开销:在13.7-17.0滴答之间

Delta时间:

平均:35.44滴答声

中位数:16个滴答声

模式:16个滴答声

最少:14个滴答声

最多:16372个滴答声

直方图(遗漏范围的频率为0):

      Range       |  Frequency
------------------+-----------
   13 < x ≤ 14    ->       192
   14 < x ≤ 21    ->     78172  <-- cached?
   21 < x ≤ 50    ->     10818
   50 < x ≤ 103   ->     10624  <-- uncached?
 5825 < x ≤ 6500  ->        88
 6500 < x ≤ 8000  ->        88
 8000 < x ≤ 10000 ->        11
10000 < x ≤ 15000 ->         4
15000 < x ≤ 16372 ->         2

QTCT :

相关代码 qtct_delta.c和 qtct_overhead.c。

CPU :核心2 6700 @ 2.66GHz

内核:Windows 7 64位

结果:

估计的开销:890-940滴答之间

Delta时间:

平均:1057.30滴答声

中位数:890 ticks

模式:890个滴答声

分钟:880个滴答声

最多:29400个滴答声

直方图(遗漏范围的频率为0):

      Range       |  Frequency
------------------+-----------
  879 < x ≤ 890   ->     71347  <-- cached?
  895 < x ≤ 1469  ->       844
 1469 < x ≤ 1600  ->     27613  <-- uncached?
 1600 < x ≤ 2000  ->        55
 2000 < x ≤ 4000  ->        86
 4000 < x ≤ 8000  ->        43
 8000 < x ≤ 16000 ->        10
16000 < x         ->         1

结论

我相信，我的问题的答案将是在我的计算机上执行错误的实现(该计算机使用具有旧Linux内核的AMD CPU)。

使用旧内核的AMD机器的CGT结果显示了一些极端的读数。如果我们查看增量时间，就会发现最频繁的增量是1 ns。这意味着对 clock_gettime的调用花费了不到一纳秒的时间!而且，它还产生了许多非常大的增量(超过3000000 ns)!这似乎是错误的行为。 (也许无法解释的核心迁移？)

备注:

CGT和QTCT的开销很大。

也很难考虑它们的开销，因为CPU缓存似乎有很大的不同。

也许坚持RDTSC，将进程锁定在一个内核上，并分配实时优先级，这是最准确的方式来告诉您一段代码使用了多少个周期...

附录

代码1 : clock_gettime_overhead.c

#include <time.h>
#include <stdio.h>
#include <stdint.h>

/* Compiled & executed with:

    gcc clock_gettime_overhead.c -O0 -lrt -o clock_gettime_overhead
    ./clock_gettime_overhead 100000
*/

int main(int argc, char **args) {
    struct timespec tstart, tend, dummy;
    int n, N;
    N = atoi(args[1]);
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &tstart);
    for (n = 0; n < N; ++n) {
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &dummy);
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &dummy);
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &dummy);
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &dummy);
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &dummy);
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &dummy);
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &dummy);
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &dummy);
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &dummy);
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &dummy);
    }
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &tend);
    printf("Estimated overhead: %lld ns\n",
            ((int64_t) tend.tv_sec * 1000000000 + (int64_t) tend.tv_nsec
                    - ((int64_t) tstart.tv_sec * 1000000000
                            + (int64_t) tstart.tv_nsec)) / N / 10);
    return 0;
}

代码2 : clock_gettime_delta.c

#include <time.h>
#include <stdio.h>
#include <stdint.h>

/* Compiled & executed with:

    gcc clock_gettime_delta.c -O0 -lrt -o clock_gettime_delta
    ./clock_gettime_delta > results
*/

#define N 100000

int main(int argc, char **args) {
    struct timespec sample, results[N];
    int n;
    for (n = 0; n < N; ++n) {
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &sample);
        results[n] = sample;
    }
    printf("%s\t%s\n", "Absolute time", "Delta");
    for (n = 1; n < N; ++n) {
        printf("%lld\t%lld\n",
               (int64_t) results[n].tv_sec * 1000000000 + 
                   (int64_t)results[n].tv_nsec,
               (int64_t) results[n].tv_sec * 1000000000 + 
                   (int64_t) results[n].tv_nsec - 
                   ((int64_t) results[n-1].tv_sec * 1000000000 + 
                        (int64_t)results[n-1].tv_nsec));
    }
    return 0;
}

代码3 : rdtsc.h

static uint64_t rdtsc() {
#if defined(__GNUC__)
#   if defined(__i386__)
    uint64_t x;
    __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
    return x;
#   elif defined(__x86_64__)
    uint32_t hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ((uint64_t)lo) | ((uint64_t)hi << 32);
#   else
#       error Unsupported architecture.
#   endif
#elif defined(_MSC_VER)
    return __rdtsc();
#else
#   error Other compilers not supported...
#endif
}

代码4 : rdtsc_delta.c

#include <stdio.h>
#include <stdint.h>
#include "rdtsc.h"

/* Compiled & executed with:

    gcc rdtsc_delta.c -O0 -o rdtsc_delta
    ./rdtsc_delta > rdtsc_delta_results

Windows:

    cl -Od rdtsc_delta.c
    rdtsc_delta.exe > windows_rdtsc_delta_results
*/

#define N 100000

int main(int argc, char **args) {
    uint64_t results[N];
    int n;
    for (n = 0; n < N; ++n) {
        results[n] = rdtsc();
    }
    printf("%s\t%s\n", "Absolute time", "Delta");
    for (n = 1; n < N; ++n) {
        printf("%lld\t%lld\n", results[n], results[n] - results[n-1]);
    }
    return 0;
}

代码5 : rdtsc_overhead.c

#include <time.h>
#include <stdio.h>
#include <stdint.h>
#include "rdtsc.h"

/* Compiled & executed with:

    gcc rdtsc_overhead.c -O0 -lrt -o rdtsc_overhead
    ./rdtsc_overhead 1000000 > rdtsc_overhead_results

Windows:

    cl -Od rdtsc_overhead.c
    rdtsc_overhead.exe 1000000 > windows_rdtsc_overhead_results
*/

int main(int argc, char **args) {
    uint64_t tstart, tend, dummy;
    int n, N;
    N = atoi(args[1]);
    tstart = rdtsc();
    for (n = 0; n < N; ++n) {
        dummy = rdtsc();
        dummy = rdtsc();
        dummy = rdtsc();
        dummy = rdtsc();
        dummy = rdtsc();
        dummy = rdtsc();
        dummy = rdtsc();
        dummy = rdtsc();
        dummy = rdtsc();
        dummy = rdtsc();
    }
    tend = rdtsc();
    printf("%G\n", (double)(tend - tstart)/N/10);
    return 0;
}

代码6 : qtct_delta.c

#include <stdio.h>
#include <stdint.h>
#include <Windows.h>

/* Compiled & executed with:

    cl -Od qtct_delta.c
    qtct_delta.exe > windows_qtct_delta_results
*/

#define N 100000

int main(int argc, char **args) {
    uint64_t ticks, results[N];
    int n;
    for (n = 0; n < N; ++n) {
        QueryThreadCycleTime(GetCurrentThread(), &ticks);
        results[n] = ticks;
    }
    printf("%s\t%s\n", "Absolute time", "Delta");
    for (n = 1; n < N; ++n) {
        printf("%lld\t%lld\n", results[n], results[n] - results[n-1]);
    }
    return 0;
}

代码7 : qtct_overhead.c

#include <stdio.h>
#include <stdint.h>
#include <Windows.h>

/* Compiled & executed with:

    cl -Od qtct_overhead.c
    qtct_overhead.exe 1000000
*/

int main(int argc, char **args) {
    uint64_t tstart, tend, ticks;
    int n, N;
    N = atoi(args[1]);
    QueryThreadCycleTime(GetCurrentThread(), &tstart);
    for (n = 0; n < N; ++n) {
        QueryThreadCycleTime(GetCurrentThread(), &ticks);
        QueryThreadCycleTime(GetCurrentThread(), &ticks);
        QueryThreadCycleTime(GetCurrentThread(), &ticks);
        QueryThreadCycleTime(GetCurrentThread(), &ticks);
        QueryThreadCycleTime(GetCurrentThread(), &ticks);
        QueryThreadCycleTime(GetCurrentThread(), &ticks);
        QueryThreadCycleTime(GetCurrentThread(), &ticks);
        QueryThreadCycleTime(GetCurrentThread(), &ticks);
        QueryThreadCycleTime(GetCurrentThread(), &ticks);
        QueryThreadCycleTime(GetCurrentThread(), &ticks);
    }
    QueryThreadCycleTime(GetCurrentThread(), &tend);
    printf("%G\n", (double)(tend - tstart)/N/10);
    return 0;
}

最佳答案

由于CLOCK_THREAD_CPUTIME_ID是使用rdtsc实现的，因此很可能会遇到与它相同的问题。 clock_gettime的手册页显示:

The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are realized on many platforms using timers from the CPUs (TSC on i386, AR.ITC on Itanium). These registers may differ between CPUs and as a consequence these clocks may return bogus results if a process is migrated to another CPU.

听起来好像可以解释您的问题？也许您应该将进程锁定到一个CPU以获得稳定的结果？

关于linux - 为什么clock_gettime如此不稳定？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6814792/

文章推荐： java - 如何比较两个日期字符串

文章推荐： scala - 为什么我不能链接多个 Scala 中缀方法调用

文章推荐： java - 在终端上使用 -classpath 不起作用

clock_gettime 获取ns为0
当我尝试获取 ns 时间时，我在内核为 2.6.18 的 Linux 中遇到了问题，如下所示: #include #include #include int main(void) { s
clock_gettime 每秒都会出现故障
我在我的 Mac 上用 C 来做基准测试。此处的代码可以编译并运行，但输出中会出现大约 1Hz 的故障。该程序“应该”每 2 毫秒中断一次，并报告上次中断时间的数据。默认设置是打印以下内容到终端 ev
Clock_Gettime() 抖动？
我在 Linux 2.6 上使用 clock_gettime()(来自 time.h)来控制线程循环中的计时。我需要在 +/- 5mS 时间范围内有 500mS。它似乎给了我 500 毫秒，然后开始漂
clock_gettime 返回一些非常大的值
我有一个在linux上运行的非常简单的代码，如下所示: struct timespec SysTime_Test; #define BILLION 1000000000L SysTime_Test.
Haskell 分析 - clock_gettime
我正在运行一个进行长时间计算的 Haskell 程序。经过一些分析和跟踪后，我注意到以下内容: $ /usr/bin/time -v ./hl test.hl 9000045000050000 Com
getrusage 与 clock_gettime()
我正在尝试获取 Ubuntu 上进程消耗的 CPU 时间。据我所知，有两个函数可以完成这项工作:getrusage() 和 clock_gettime(CLOCK_PROCESS_CPUTIME_ID
当程序从终端运行时，clock_gettime 需要更长的时间来执行
我试图测量一段代码的时间，并注意到当我从我的编辑器 QtCreator 中运行程序时，与我从 gnome 终端中启动的 bash shell 运行它时相比，时间快了大约 50ns。我使用 Ubuntu
c - Clock_gettime() 调用是原子的吗？
我在多线程代码中使用 timespec 结构 - 一个线程调用 clock_gettime()填充全局 timespec 结构，另一个 - 读取该结构。问题:是clock_gettime()调用原子还
C、clock_gettime，返回了不正确的纳秒值？
我正在编写一个简单的程序，它检查耗时是否超过 1 秒。我使用clock_gettime()获取开始时间，然后调用sleep(5)，获取新时间并检查差异是否大于1；我睡了 5 秒，那么它应该大于 5，但
c++ - Clock_gettime() 函数输出不正确的时间
我正在尝试使用 clock_gettime 函数获取以下代码的运行时间。但是，当我运行代码时，每次运行时我都会收到 0.0000 的时间。我也分别输出了开始和停止时间，我收到了完全相同的答案。 str
c++ clock_gettime()溢出了吗？
当使用下面的示例代码时，我认为 tv_nsec 值是循环的，因为它只是一个long； #include using namespace std; #include int main(int arg
linux - clock_gettime() 不起作用
我阅读了以下手册: http://linux.die.net/man/3/clock_gettime 然后我写了下面的代码: #include int main() { struct tim
linux - Clock_gettime 如何实现纳秒分辨率？
Linux 中的 clock_gettime 函数在调用以测量给定代码段的运行时间时，内部使用哪个系统硬件计时器将纳秒级分辨率返回给用户代码？最佳答案现代 CPU 以几 GHz 时钟频率运行。 1
MacOS 上的 clock_gettime
clock_gettime 不再适用于 MacOS Sierra。很确定在 Xcode 8 出来之前我已经正确地编译了这个。我真的很困惑如何才能让它正确编译。 #include #include
c - 如何使用 clock_gettime
我正在尝试使用 clock_gettime 函数，但无法弄清楚所需的 header 是什么(或者如果这不是 header 的问题，我做错了什么)。这是我的代码: #include #include
clock_gettime 总是显示 0
我想用 clock_gettime 测量挂钟时间，但每次运行我的代码时，它都显示 0。这是为什么？ (我希望我的结果以毫秒为单位。) #include #include #include #in
C、clock_gettime，返回不正确的纳秒值？
我正在编写一个简单的程序，用于检查耗时是否超过 1 秒。我使用 clock_gettime() 获取开始时间，然后调用 sleep(5)，获取新时间并检查差异是否大于 1；我睡了 5 秒，那么它应该大
c++ clock_gettime() 和夏令时
我在我的 C++ 程序中使用 clock_gettime() 来获取当前时间。但是，返回值是自 UTC 纪元以来的秒数。在夏令时期间，当时间偏移一小时时，此代码在我的时区可能会搞砸。系统本身有 NT
clock_gettime API 给出负值
我想要以微秒为单位的当前系统时间，所以我使用 clock_gettime 编写了一个程序，但它有时会返回负值。有人可以帮我解决这个问题吗？ int main(void) { struct ti
c++ - clock_gettime 返回一些异常值
我正在尝试用这个类分析一个程序: namespace smtx{ class CPerformance_clock { timespec t1; tim

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

linux - 为什么clock_gettime如此不稳定？