gpt4 book ai didi

multithreading - 原子内存排序性能差异

转载 作者:行者123 更新时间:2023-12-03 12:47:02 24 4
gpt4 key购买 nike

我写了一个小测试来检查不同内存排序的原子加载的性能差异,我发现对于宽松和顺序一致的内存排序,性能是相同的。它的发生仅仅是由于次优的编译器实现,还是我通常可以在 x86 处理器上预期的结果?我使用编译器 gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)。我使用优化 -O2 编译了我的测试(这就是为什么使用简单变量的第二个测试显示执行时间为零)。

Results:
Start volatile tests with 1000000000 iterations
volatile test took 689438 microseconds. Last value of local var is 1
Start simple var tests with 1000000000 iterations
simple var test took 0 microseconds. Last value of local var is 2
Start relaxed atomic tests with 1000000000 iterations
relaxed atomic test took 25655002 microseconds. Last value of local var is 3
Start sequentially consistent atomic tests with 1000000000 iterations
sequentially consistent atomic test took 24844000 microseconds. Last value of local var is 4

这是测试函数:

std::atomic<int> atomic_var;
void relaxed_atomic_test(const unsigned iterations)
{
cout << "Start relaxed atomic tests with " << iterations << " iterations" << endl;
const microseconds start(std::chrono::system_clock::now().time_since_epoch());
int local_var = 0;
for(unsigned counter = 0; iterations != counter; ++counter)
{
local_var = atomic_var.load(memory_order_relaxed);
}
const microseconds end(std::chrono::system_clock::now().time_since_epoch());
cout << "relaxed atomic test took " << (end - start).count()
<< " microseconds. Last value of local var is " << local_var << endl;
}

void sequentially_consistent_atomic_test(const unsigned iterations)
{
cout << "Start sequentially consistent atomic tests with "
<< iterations << " iterations" << endl;
const microseconds start(std::chrono::system_clock::now().time_since_epoch());
int local_var = 0;
for(unsigned counter = 0; iterations != counter; ++counter)
{
local_var = atomic_var.load(memory_order_seq_cst);
}
const microseconds end(std::chrono::system_clock::now().time_since_epoch());
cout << "sequentially consistent atomic test took " << (end - start).count()
<< " microseconds. Last value of local var is " << local_var << endl;
}

更新:我尝试了相同的测试,但改为读取我使用写入原子变量。结果完全不同——写入 memory_order_relaxed 原子花费的时间与写入 volatile 花费的时间相同:

Start volatile tests with 1000000000 iterations
volatile test took 764088 microseconds. Last volatile_var value 999999999
Start simple var tests with 1000000000 iterations
simple var test took 0 microseconds. Last var value999999999
Start relaxed atomic tests with 1000000000 iterations
relaxed atomic test took 763968 microseconds. Last atomic_var value 999999999
Start sequentially consistent atomic tests with 1000000000 iterations
sequentially consistent atomic test took 15287267 microseconds. Last atomic_var value 999999999

所以我可以得出结论,在具有宽松内存排序的单线程原子中,对于存储操作表现为 volatile ,对于加载操作表现为具有顺序一致内存排序的原子(使用此处理器和编译器)

最佳答案

x86 是一种相对严格的内存架构,因此您可能会看到两者之间的性能相似。您会在允许像 POWER 这样的更多重新排序的架构上看到更大的差异。

关于multithreading - 原子内存排序性能差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21298749/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com