gpt4 book ai didi

c++ - 为什么链接到 librt 在 g++ 和 clang 之间交换性能?

转载 作者:塔克拉玛干 更新时间:2023-11-03 00:43:12 25 4
gpt4 key购买 nike

我刚找到 this answer来自@tony-d,带有用于测试虚函数调用开销的工作台代码。我检查了是否使用 g++ 进行基准测试:

$ g++ -O2 -o vdt vdt.cpp -lrt
$ ./vdt
virtual dispatch: 150000000 0.128562
switched: 150000000 0.0803207
overheads: 150000000 0.0543323
...

我得到了比他更好的性能(比率大约为 2),但后来我用 clang 进行了检查:

$ clang++-3.7 -O2 -o vdt vdt.cpp -lrt
$ ./vdt
virtual dispatch: 150000000 0.462368
switched: 150000000 0.0569544
overheads: 150000000 0.0509332
...

现在这个比例上升到大约 70!

然后我注意到了 -lrt 命令行参数,在对 librt 进行了一些谷歌搜索后,我尝试在没有它的情况下使用 g++ clang :

$ g++ -O2 -o vdt vdt.cpp
$ ./vdt
virtual dispatch: 150000000 0.4661
switched: 150000000 0.0815865
overheads: 150000000 0.0543611
...
$ clang++-3.7 -O2 -o vdt vdt.cpp
$ ./vdt
virtual dispatch: 150000000 0.155901
switched: 150000000 0.0568319
overheads: 150000000 0.0492521
...

如您所见,性能交换

根据我对 librt 的了解,clock_gettime 和其他相关时间计算需要它(也许我错了,请在这种情况下纠正我!)但是代码在没有 -lrt 的情况下编译正常,而且从我所看到的来看时间似乎是正确的。

为什么链接/不链接 librt 会如此影响该代码?


关于我的系统和编译器的信息:

$ g++ --version
g++-5 (Ubuntu 5.3.0-3ubuntu1~14.04) 5.3.0 20151204
Copyright (C) 2015 Free Software Foundation, Inc.

$ clang++-3.7 --version
Debian clang version 3.7.1-svn254351-1~exp1 (branches/release_37) (based on LLVM 3.7.1)
Target: x86_64-pc-linux-gnu
Thread model: posix

$ uname -a
Linux ****** 3.13.0-86-generic #130-Ubuntu SMP Mon Apr 18 18:27:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

最佳答案

我猜这与 oprimizer 相关(如果指定了 -lrt,因为尝试与库链接,优化器拥有更多数据并且可以进行不同的优化)。

至于差异,我的 g++ (4.8.4) 使用和不使用 -lrt 的结果相同,但 clang (3.4.-lubuntu3) 存在差异。我尝试通过 perftools statistics 运行它,结果如下:

$ g++ -O2 -o vdt vdt.cpp -lrt && perf stat -d ./vdt
virtual dispatch: 150000000 1.2304
switched: 150000000 0.131782
overheads: 150000000 0.0842732
virtual dispatch: 150000000 1.13689
switched: 150000000 0.137304
overheads: 150000000 0.0854806
virtual dispatch: 150000000 1.19261
switched: 150000000 0.133561
overheads: 150000000 0.0969093

Performance counter stats for './vdt':

4068.861539 task-clock (msec) # 0.961 CPUs utilized
1,068 context-switches # 0.262 K/sec
0 cpu-migrations # 0.000 K/sec
431 page-faults # 0.106 K/sec
11,977,128,883 cycles # 2.944 GHz [40.18%]
6,088,274,331 stalled-cycles-frontend # 50.83% frontend cycles idle [39.92%]
3,984,855,636 stalled-cycles-backend # 33.27% backend cycles idle [39.98%]
6,581,309,599 instructions # 0.55 insns per cycle
# 0.93 stalled cycles per insn [50.06%]
1,506,617,848 branches # 370.280 M/sec [50.12%]
303,871,937 branch-misses # 20.17% of all branches [49.88%]
2,708,080,460 L1-dcache-loads # 665.562 M/sec [49.94%]
559,844,530 L1-dcache-load-misses # 20.67% of all L1-dcache hits [50.28%]
0 LLC-loads # 0.000 K/sec [40.05%]
0 LLC-load-misses # 0.00% of all LL-cache hits [39.98%]

4.232477683 seconds time elapsed

$ g++ -O2 -o vdt vdt.cpp && perf stat -d ./vdt
virtual dispatch: 150000000 1.11517
switched: 150000000 0.14231
overheads: 150000000 0.0840234
virtual dispatch: 150000000 1.11355
switched: 150000000 0.130082
overheads: 150000000 0.116934
virtual dispatch: 150000000 1.16225
switched: 150000000 0.13281
overheads: 150000000 0.0798615

Performance counter stats for './vdt':

4050.314222 task-clock (msec) # 0.993 CPUs utilized
707 context-switches # 0.175 K/sec
0 cpu-migrations # 0.000 K/sec
402 page-faults # 0.099 K/sec
12,213,599,260 cycles # 3.015 GHz [39.72%]
6,987,416,990 stalled-cycles-frontend # 57.21% frontend cycles idle [40.25%]
4,675,829,189 stalled-cycles-backend # 38.28% backend cycles idle [40.17%]
6,611,623,206 instructions # 0.54 insns per cycle
# 1.06 stalled cycles per insn [50.54%]
1,505,162,879 branches # 371.616 M/sec [50.48%]
298,748,152 branch-misses # 19.85% of all branches [50.30%]
2,710,580,651 L1-dcache-loads # 669.227 M/sec [50.04%]
551,212,908 L1-dcache-load-misses # 20.34% of all L1-dcache hits [49.86%]
3 LLC-loads # 0.001 K/sec [39.62%]
0 LLC-load-misses # 0.00% of all LL-cache hits [40.01%]

4.080288324 seconds time elapsed

$ clang++ -O2 -o vdt vdt.cpp -lrt && perf stat -d ./vdt
virtual dispatch: 150000000 0.276252
switched: 150000000 0.11926
overheads: 150000000 0.0733678
virtual dispatch: 150000000 0.249832
switched: 150000000 0.0892711
overheads: 150000000 0.117108
virtual dispatch: 150000000 0.247705
switched: 150000000 0.109486
overheads: 150000000 0.0762541

Performance counter stats for './vdt':

1347.887606 task-clock (msec) # 0.989 CPUs utilized
222 context-switches # 0.165 K/sec
0 cpu-migrations # 0.000 K/sec
430 page-faults # 0.319 K/sec
3,558,892,668 cycles # 2.640 GHz [42.47%]
1,316,787,839 stalled-cycles-frontend # 37.00% frontend cycles idle [42.61%]
438,592,926 stalled-cycles-backend # 12.32% backend cycles idle [40.57%]
6,388,507,180 instructions # 1.80 insns per cycle
# 0.21 stalled cycles per insn [50.49%]
1,514,291,853 branches # 1123.456 M/sec [50.19%]
1,095,265 branch-misses # 0.07% of all branches [48.66%]
2,485,922,557 L1-dcache-loads # 1844.310 M/sec [47.99%]
577,213,257 L1-dcache-load-misses # 23.22% of all L1-dcache hits [48.20%]
2 LLC-loads # 0.001 K/sec [40.51%]
0 LLC-load-misses # 0.00% of all LL-cache hits [40.17%]

1.362403811 seconds time elapsed

$ clang++ -O2 -o vdt vdt.cpp && perf stat -d ./vdt
virtual dispatch: 150000000 1.0894
switched: 150000000 0.0849747
overheads: 150000000 0.0726611
virtual dispatch: 150000000 1.03949
switched: 150000000 0.0849843
overheads: 150000000 0.0768674
virtual dispatch: 150000000 1.07786
switched: 150000000 0.0893431
overheads: 150000000 0.0725624

Performance counter stats for './vdt':

3667.235804 task-clock (msec) # 0.993 CPUs utilized
356 context-switches # 0.097 K/sec
0 cpu-migrations # 0.000 K/sec
402 page-faults # 0.110 K/sec
11,052,067,182 cycles # 3.014 GHz [39.98%]
5,346,555,173 stalled-cycles-frontend # 48.38% frontend cycles idle [40.10%]
3,480,506,097 stalled-cycles-backend # 31.49% backend cycles idle [40.09%]
6,351,819,740 instructions # 0.57 insns per cycle
# 0.84 stalled cycles per insn [50.07%]
1,524,106,229 branches # 415.601 M/sec [50.17%]
299,296,742 branch-misses # 19.64% of all branches [50.05%]
2,393,484,447 L1-dcache-loads # 652.667 M/sec [49.93%]
554,010,640 L1-dcache-load-misses # 23.15% of all L1-dcache hits [49.88%]
0 LLC-loads # 0.000 K/sec [40.33%]
0 LLC-load-misses # 0.00% of all LL-cache hits [39.83%]

3.692786417 seconds time elapsed

我所看到的是,clang 中的分支预测(分支未命中)存在一些差异(对优化器而言也是如此)。

关于c++ - 为什么链接到 librt 在 g++ 和 clang 之间交换性能?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37183531/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com