gpt4 book ai didi

c++11 - 测量原子增量与常规整数增量相比有多慢

转载 作者:行者123 更新时间:2023-12-04 20:37:30 24 4
gpt4 key购买 nike

最近的一次讨论让我想知道原子增量与常规整数增量相比有多昂贵。

我写了一些代码来尝试对此进行基准测试:

#include <iostream>
#include <atomic>
#include <chrono>

static const int NUM_TEST_RUNS = 100000;
static const int ARRAY_SIZE = 500;

void runBenchmark(std::atomic<int>& atomic_count, int* count_array, int array_size, bool do_atomic_increment){
for(int i = 0; i < array_size; ++i){
++count_array[i];
}

if(do_atomic_increment){
++atomic_count;
}
}

int main(int argc, char* argv[]){

int num_test_runs = NUM_TEST_RUNS;
int array_size = ARRAY_SIZE;

if(argc == 3){
num_test_runs = atoi(argv[1]);
array_size = atoi(argv[2]);
}

if(num_test_runs == 0 || array_size == 0){
std::cout << "Usage: atomic_operation_overhead <num_test_runs> <num_integers_in_array>" << std::endl;
return 1;
}

// Instantiate atomic counter
std::atomic<int> atomic_count;

// Allocate the integer buffer that will be updated every time
int* count_array = new int[array_size];

// Track the time elapsed in case of incrmeenting with mutex locking
auto start = std::chrono::steady_clock::now();
for(int i = 0; i < num_test_runs; ++i){
runBenchmark(atomic_count, count_array, array_size, true);
}
auto end = std::chrono::steady_clock::now();

// Calculate time elapsed for incrementing without mutex locking
auto diff_with_lock = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
std::cout << "Elapsed time with atomic increment for "
<< num_test_runs << " test runs: "
<< diff_with_lock.count() << " ns" << std::endl;

// Track the time elapsed in case of incrementing without a mutex locking
start = std::chrono::steady_clock::now();
for(unsigned int i = 0; i < num_test_runs; ++i){
runBenchmark(atomic_count, count_array, array_size, false);
}
end = std::chrono::steady_clock::now();

// Calculate time elapsed for incrementing without mutex locking
auto diff_without_lock = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
std::cout << "Elapsed time without atomic increment for "
<< num_test_runs << " test runs: "
<< diff_without_lock.count() << " ns" << std::endl;

auto difference_running_times = diff_with_lock - diff_without_lock;
auto proportion = difference_running_times.count() / (double)diff_without_lock.count();
std::cout << "How much slower was locking: " << proportion * 100.0 << " %" << std::endl;

// We loop over all entries in the array and print their sum
// We do this mainly to prevent the compiler from optimizing out
// the loop where we increment all the values in the array
int array_sum = 0;
for(int i = 0; i < array_size; ++i){
array_sum += count_array[i];
}
std::cout << "Array sum (just to prevent loop getting optimized out): " << array_sum << std::endl;

delete [] count_array;

return 0;
}

我遇到的问题是这个程序在每次运行中都会产生大相径庭的结果:
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 1000 500
Elapsed time with atomic increment for 1000 test runs: 99852 ns
Elapsed time without atomic increment for 1000 test runs: 96396 ns
How much slower was locking: 3.58521 %
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 1000 500
Elapsed time with atomic increment for 1000 test runs: 182769 ns
Elapsed time without atomic increment for 1000 test runs: 138319 ns
How much slower was locking: 32.1359 %
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 1000 500
Elapsed time with atomic increment for 1000 test runs: 98858 ns
Elapsed time without atomic increment for 1000 test runs: 96404 ns
How much slower was locking: 2.54554 %
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 1000 500
Elapsed time with atomic increment for 1000 test runs: 107848 ns
Elapsed time without atomic increment for 1000 test runs: 105174 ns
How much slower was locking: 2.54245 %
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 1000 500
Elapsed time with atomic increment for 1000 test runs: 113865 ns
Elapsed time without atomic increment for 1000 test runs: 100559 ns
How much slower was locking: 13.232 %
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 1000 500
Elapsed time with atomic increment for 1000 test runs: 98956 ns
Elapsed time without atomic increment for 1000 test runs: 106639 ns
How much slower was locking: -7.20468 %

这使我相信基准测试代码本身可能存在错误。有什么我遗漏的错误吗?我使用 std::chrono 进行基准测试是否不正确?或者是由于操作系统中与原子操作有关的信号处理的开销而产生的时间差?

我可能做错了什么?

试验台:
Intel® Core™ i7-4700MQ CPU @ 2.40GHz × 8  
8GB RAM
GNU/Linux:Ubuntu LTS 14.04 (64 bit)
GCC version: 4.8.4
Compilation: g++ -std=c++11 -O3 atomic_operation_overhead.cpp -o atomic_operation_overhead

编辑:使用 -O3 优化编译后更新测试运行输出。

编辑:在运行测试以增加迭代次数并添加循环总和以防止优化超出 Adam 建议的循环增量后,我得到了更收敛的结果:
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7111974931 ns
Elapsed time without atomic increment for 99999999 test runs: 6938317779 ns
How much slower was locking: 2.50287 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7424952991 ns
Elapsed time without atomic increment for 99999999 test runs: 7262721866 ns
How much slower was locking: 2.23375 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7172114343 ns
Elapsed time without atomic increment for 99999999 test runs: 7030985219 ns
How much slower was locking: 2.00725 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7094552104 ns
Elapsed time without atomic increment for 99999999 test runs: 6971060941 ns
How much slower was locking: 1.77148 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7099907902 ns
Elapsed time without atomic increment for 99999999 test runs: 6970289856 ns
How much slower was locking: 1.85958 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7763604675 ns
Elapsed time without atomic increment for 99999999 test runs: 7229145316 ns
How much slower was locking: 7.39312 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7164534212 ns
Elapsed time without atomic increment for 99999999 test runs: 6994993609 ns
How much slower was locking: 2.42374 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7154697145 ns
Elapsed time without atomic increment for 99999999 test runs: 6997030700 ns
How much slower was locking: 2.25333 %
Array sum (just to prevent loop getting optimized out): 1215751192

最佳答案

一些想法:

  • 运行更多的迭代,至少需要几秒钟。您的运行需要几毫秒,因此 IO 中断可能足以扭曲您的结果。
  • 最后打印出总和。编译器可能足够聪明,可以比您想象的更优化您的循环,因此您的代码执行的工作可能比您想象的要少。如果编译器发现该值从未被读取,它可能会完全删除您的循环。
  • 在一个循环中完成迭代,而不是调用函数的循环。虽然编译器可能会内联您的函数调用,但最好不要引入另一个潜在的噪声源。
  • 我相信你接下来会这样做,但添加一个线程测试。不妨为两者都做;由于竞争,您将在非原子变量中得到错误的总和,但至少您会看到为一致性付出的性能损失。
  • 关于c++11 - 测量原子增量与常规整数增量相比有多慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32687523/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com