c++11 - 测量原子增量与常规整数增量相比有多慢-6ren

c++11 - 测量原子增量与常规整数增量相比有多慢

转载作者：行者123 更新时间：2023-12-04 20:37:30

最近的一次讨论让我想知道原子增量与常规整数增量相比有多昂贵。

我写了一些代码来尝试对此进行基准测试:

#include <iostream>
#include <atomic>
#include <chrono>

static const int NUM_TEST_RUNS = 100000;
static const int ARRAY_SIZE = 500;

void runBenchmark(std::atomic<int>& atomic_count, int* count_array, int array_size, bool do_atomic_increment){    
    for(int i = 0; i < array_size; ++i){
        ++count_array[i];        
    }

    if(do_atomic_increment){
        ++atomic_count;
    }
}

int main(int argc, char* argv[]){

    int num_test_runs = NUM_TEST_RUNS;
    int array_size = ARRAY_SIZE;

    if(argc == 3){
        num_test_runs = atoi(argv[1]);
        array_size = atoi(argv[2]);        
    }

    if(num_test_runs == 0 || array_size == 0){
        std::cout << "Usage: atomic_operation_overhead <num_test_runs> <num_integers_in_array>" << std::endl;
        return 1;   
    }

    // Instantiate atomic counter
    std::atomic<int> atomic_count;

    // Allocate the integer buffer that will be updated every time
    int* count_array = new int[array_size];

    // Track the time elapsed in case of incrmeenting with mutex locking
    auto start = std::chrono::steady_clock::now();
    for(int i = 0; i < num_test_runs; ++i){
        runBenchmark(atomic_count, count_array, array_size, true);        
    }
    auto end = std::chrono::steady_clock::now();

    // Calculate time elapsed for incrementing without mutex locking
    auto diff_with_lock = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
    std::cout << "Elapsed time with atomic increment for " 
              << num_test_runs << " test runs: "
              << diff_with_lock.count() << " ns" << std::endl;

    // Track the time elapsed in case of incrementing without a mutex locking
    start = std::chrono::steady_clock::now();
    for(unsigned int i = 0; i < num_test_runs; ++i){
        runBenchmark(atomic_count, count_array, array_size, false);
    }
    end = std::chrono::steady_clock::now();

    // Calculate time elapsed for incrementing without mutex locking
    auto diff_without_lock = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
    std::cout << "Elapsed time without atomic increment for " 
              << num_test_runs << " test runs: "
              << diff_without_lock.count() << " ns" << std::endl;

    auto difference_running_times = diff_with_lock - diff_without_lock;
    auto proportion = difference_running_times.count() / (double)diff_without_lock.count();          
    std::cout << "How much slower was locking: " << proportion * 100.0 << " %" << std::endl;           

    // We loop over all entries in the array and print their sum
    // We do this mainly to prevent the compiler from optimizing out
    // the loop where we increment all the values in the array
    int array_sum = 0;
    for(int i = 0; i < array_size; ++i){
        array_sum += count_array[i];
    }
    std::cout << "Array sum (just to prevent loop getting optimized out): " << array_sum << std::endl;

    delete [] count_array;

    return 0;
}

我遇到的问题是这个程序在每次运行中都会产生大相径庭的结果:

balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 1000 500
Elapsed time with atomic increment for 1000 test runs: 99852 ns
Elapsed time without atomic increment for 1000 test runs: 96396 ns
How much slower was locking: 3.58521 %
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 1000 500
Elapsed time with atomic increment for 1000 test runs: 182769 ns
Elapsed time without atomic increment for 1000 test runs: 138319 ns
How much slower was locking: 32.1359 %
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 1000 500
Elapsed time with atomic increment for 1000 test runs: 98858 ns
Elapsed time without atomic increment for 1000 test runs: 96404 ns
How much slower was locking: 2.54554 %
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 1000 500
Elapsed time with atomic increment for 1000 test runs: 107848 ns
Elapsed time without atomic increment for 1000 test runs: 105174 ns
How much slower was locking: 2.54245 %
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 1000 500
Elapsed time with atomic increment for 1000 test runs: 113865 ns
Elapsed time without atomic increment for 1000 test runs: 100559 ns
How much slower was locking: 13.232 %
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 1000 500
Elapsed time with atomic increment for 1000 test runs: 98956 ns
Elapsed time without atomic increment for 1000 test runs: 106639 ns
How much slower was locking: -7.20468 %

这使我相信基准测试代码本身可能存在错误。有什么我遗漏的错误吗？我使用 std::chrono 进行基准测试是否不正确？或者是由于操作系统中与原子操作有关的信号处理的开销而产生的时间差？

我可能做错了什么？

试验台:

Intel® Core™ i7-4700MQ CPU @ 2.40GHz × 8  
8GB RAM 
GNU/Linux:Ubuntu LTS 14.04 (64 bit)
GCC version: 4.8.4    
Compilation: g++ -std=c++11 -O3 atomic_operation_overhead.cpp  -o atomic_operation_overhead

编辑:使用 -O3 优化编译后更新测试运行输出。

编辑:在运行测试以增加迭代次数并添加循环总和以防止优化超出 Adam 建议的循环增量后，我得到了更收敛的结果:

balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7111974931 ns
Elapsed time without atomic increment for 99999999 test runs: 6938317779 ns
How much slower was locking: 2.50287 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7424952991 ns
Elapsed time without atomic increment for 99999999 test runs: 7262721866 ns
How much slower was locking: 2.23375 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7172114343 ns
Elapsed time without atomic increment for 99999999 test runs: 7030985219 ns
How much slower was locking: 2.00725 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7094552104 ns
Elapsed time without atomic increment for 99999999 test runs: 6971060941 ns
How much slower was locking: 1.77148 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7099907902 ns
Elapsed time without atomic increment for 99999999 test runs: 6970289856 ns
How much slower was locking: 1.85958 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7763604675 ns
Elapsed time without atomic increment for 99999999 test runs: 7229145316 ns
How much slower was locking: 7.39312 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7164534212 ns
Elapsed time without atomic increment for 99999999 test runs: 6994993609 ns
How much slower was locking: 2.42374 %
Array sum (just to prevent loop getting optimized out): 1215751192
balajeerc@Balajee:~/Projects/Misc$ ./atomic_operation_overhead 99999999 500
Elapsed time with atomic increment for 99999999 test runs: 7154697145 ns
Elapsed time without atomic increment for 99999999 test runs: 6997030700 ns
How much slower was locking: 2.25333 %
Array sum (just to prevent loop getting optimized out): 1215751192

最佳答案

一些想法:

运行更多的迭代，至少需要几秒钟。您的运行需要几毫秒，因此 IO 中断可能足以扭曲您的结果。

最后打印出总和。编译器可能足够聪明，可以比您想象的更优化您的循环，因此您的代码执行的工作可能比您想象的要少。如果编译器发现该值从未被读取，它可能会完全删除您的循环。

在一个循环中完成迭代，而不是调用函数的循环。虽然编译器可能会内联您的函数调用，但最好不要引入另一个潜在的噪声源。

我相信你接下来会这样做，但添加一个线程测试。不妨为两者都做；由于竞争，您将在非原子变量中得到错误的总和，但至少您会看到为一致性付出的性能损失。

关于c++11 - 测量原子增量与常规整数增量相比有多慢，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32687523/

文章推荐： apache-spark - 如何在mllib中准备训练数据

文章推荐： r - 如何对齐不同高度的 Axis 标签？

文章推荐： gcc inline - 'cmp' 的操作数类型不匹配

haskell - .与haskell中的$相比
这个问题在这里已经有了答案: 11年前关闭。 Possible Duplicate: Haskell: difference between . (dot) and $ (dollar sign) 好
Java开发标准和工具(与C#相比)
我对 Java 平台没有任何了解，我想知道可以使用哪些工具(和方法)来帮助开发用 Java 编写的可维护代码。我知道可以使用: 适用于任何环境的敏捷方法用于单元测试代码的 jUnit/jMock(
css - 与IE10+相比，IE9不支持的所有CSS规则有没有聪明的方法？
我们的产品需要支持 IE9，但我们一直假设 IE9 支持 IE10+ CSS 规则。是否有一种巧妙的方法来获取在 IE10+ 中有效但在 IE9 中不受支持的所有 CSS 规则，目的是在静态代码分析
c++ - 重载 += 与 + 相比
我需要为 MyString 类重载运算符 + 和 +=。 MyString.h class MyString { char* m_pStr; }; 主要
java - 与 i++ 相比，执行强制转换操作的成本有多高？
按照目前的情况，这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持，但这个问题可能会引发辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the
c++ - `ofstream` 与 0 相比
我正在升级现有的旧代码以使用 VS 2019*，在代码中我有以下函数在返回行失败: int foo(const char *fn) const { ofstream out(fn,ios::b
R2D3 与 D3.js 相比
我想使用 R2D3 pacakge 在 R 中，但我不确定这个包与 D3.js 库的关系。 R2D3 是否以任何方式限制 D3 的功能？我们可以将 R 中的所有 D3 功能和特性与 R2D3 一起使用
c# - 语音识别质量极差，尤其是与 Word 相比
我正在使用 WPF 语音识别库，试图在桌面应用程序中使用它来替代菜单命令。 (我想专注于没有键盘的平板电脑体验)。它可以工作 - 有点，除了识别的准确性太差以至于无法使用。所以我试着听写到 Word。
sql - SELECT FROM WHERE IN 与多个表上的 SELECT FROM 相比
我在学校参加数据库类(class)。老师给了我们一个简单的练习:考虑以下简单的模式: Table Book: Column title (primary key) Column gen
mvvm - MVVM 架构中的模型与 MVC 相比
我正在尝试学习 MVVM 模式，特别是当 View 表示数据库表时该怎么做，但 View 有几个元素表示单个数据库字段。举个简单的例子: 假设我有一个 DateTime 类型的数据库字段(每个数据库字
与具有几百万行的表上的 "<>"相比，SQL Server "="运算符非常慢
我有两张 table 。表单有约 77000 行。日志约有 270 万行。以下查询将在不到一秒的时间内返回“30198”: SELECT COUNT(DISTINCT logs.DOCID) FRO
R 中的回归(与 Eviews 相比)
当您在 Eviews 中进行回归时，您会得到一组这样的统计数据: 在 R 中有没有一种方法可以在一个列表中获得所有/大部分关于 R 回归的统计数据？最佳答案请参阅summary，它将为大多数回归对
dictionary - 与Go中的 map 相比，搜索无序数组要花多少个元素
如果我枚举 type XType int const ( X1 XType = iota X2 ... Xn ) var XTypeNames = []string{"x1", "x2
r - 与 randomForest 相比，游侠的错误预测
我正在试用 ranger R包加速做了很多randomForest计算。我正在检查我从中得到的预测，并注意到一些有趣的事情，因为所做的预测完全不正确。以下是比较 randomForest 的可重现示
clang - 与 GCC 相比，编译时间显着变慢
我发现 Clang 编译速度比 GCC 慢了四倍。知道是什么原因造成的吗？ ebg@tsuki(250)$ time /usr/bin/cc -DHC4 -DSAFETY -DNOREDUCE -DN
jquery - $.ajax 与 $.post 相比
我注意到在尝试以 JSON 格式发布表单数据时，以下内容不起作用: $.ajax({ type: "POST", url: url, data: JSON.string
c# - 与#if/#endif 相比，条件属性的缺点是什么？
我的代码库中有很多 #if DEBUG/#endif 语句，它们大多具有断言类型逻辑，我不敢在生产环境中运行这些逻辑。 [Conditional("DEBUG")] public void Check
c - 想知道何时使用 do while 循环与 while 相比(内部示例)
所以我正在开发一个平方根计算器，但我不知道 while 循环是否比 do while 循环更适合。 double x, y = 1.0, newY, squareRoot; bool
c# - 与另一个列表 LINQ 相比，检查一个列表中有哪些元素
我有两个列表，一个是所有语言，另一个是网站拥有的语言子集，我的想法是返回所有语言，但如果子集的元素对应于所有语言的列表，则更改 bool 值的属性. 语言的DTO: public class DTOL
c# - 与 winform 相比，来自控制台的过程
以下控制台应用程序运行正常 - 我很惊讶它没有出错。 class DelegateExperiments { //>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c++11 - 测量原子增量与常规整数增量相比有多慢