Suppose I make two relaxed modifications to two atomic objects in thread0
, one per each object, and then make thread1
observe the modification that came second in thread0
.
Now without memory fences, if thread1
were to try and load the first object, it might not get the newly stored value (without release on the store and an acquire on the observation), but what if it does an atomic read-modify-write operation on it?
假设我对thread0中的两个原子对象进行了两次轻松的修改,每个对象一个,然后让thread1观察到thread0中的第二个修改。现在,如果没有内存围栏,如果thread1尝试加载第一个对象,它可能不会获得新存储的值(在存储上没有释放,在观察上没有获取),但是如果它对它执行原子的读-修改-写操作呢?
#include <stdatomic.h>
#include <stddef.h>
#include <assert.h>
int _Atomic atoInt = 0;
_Atomic _Bool atoBool= 0;
void thread0(void){
atomic_store_explicit(&atoInt,42,memory_order_relaxed);
atomic_signal_fence(memory_order_seq_cst); //prevent compiler reordering
atomic_store_explicit(&atoBool,1,memory_order_relaxed);
}
void thread1(void){
//observe the 0 => 1 transition
while(0==(atomic_load_explicit(&atoBool,memory_order_relaxed))){}
atomic_signal_fence(memory_order_seq_cst); //prevent compiler reordering
//could still get the stale value because release/acquire wan't used
assert(
0==atomic_load_explicit(&atoInt,memory_order_relaxed)
|| 42==atomic_load_explicit(&atoInt,memory_order_relaxed)
);
//should hold regardless because it's an RMW?
assert(42==atomic_fetch_sub_explicit(&atoInt,1,memory_order_relaxed));
}
Based on my understanding of how cache coherence and fences work, I believe the RMW operation must necessarily get the new value.
根据我对缓存一致性和栅栏工作原理的理解,我认为RMW操作一定会获得新的值。
Is this correct?
这样对吗?
更多回答
I don't think anything in the ISO C standard formally guarantees that an RMW would see the stored value. With compile-time reordering of the stores (since they're also both relaxed), it's super easy for the assert to fail even on a strongly-ordered machine like x86.
我认为ISO C标准中的任何内容都不能正式保证RMW会看到存储的值。通过在编译时重新排序存储(因为它们都是宽松的),断言非常容易失败,即使在像x86这样的强有序机器上也是如此。
A more interesting computer-architecture question is whether any real or hypothetical ISA could break dependency-ordering for loads but not RMWs. (The HW behaviour that most ISAs have, which memory_order_consume
was intended to expose to programmers, but the design proved impractical, so real code like Linux RCU uses effectively relaxed
loads and writes code like this where the compiler can't replace the load result with a constant or otherwise break the data depency e.g. by turning it into a control dependency (branch).)
一个更有趣的计算机体系结构问题是,任何真实的或假想的ISA是否可以打破负载的依赖顺序,但不能打破RMW的依赖顺序。(大多数ISA拥有的HW行为,MEMORY_ORDER_USER旨在向程序员公开,但其设计被证明是不切实际的,所以像Linux RCU这样的实际代码使用有效的宽松加载并编写这样的代码,其中编译器不能用常量替换加载结果或以其他方式打破数据依赖,例如通过将其转换为控制依赖(分支)。)
@PeterCordes Alright. And if compile-time reordering is prevented (I guess atomic_signal_fence(memory_order_seq_cst)
should do it?)?
@PeterCordes好的。如果编译时重新排序被阻止(我猜ATOM_SIGNAL_FARCH(Memory_Order_Seq_Cst)应该这样做?)?
The only real-world ISA without dependency ordering is DEC Alpha (Memory order consume usage in C11). Hypothetical things that could break dependency-ordering include value-prediction for loads. If that's what you're thinking of, then yeah perhaps, since an atomic RMW can't retire until the store commits to L1d cache, but stores need to be non-speculative, so all prior speculation has to be confirmed before the store side of an atomic RMW can commit.
现实世界中唯一没有依赖项排序的ISA是DEC Alpha(C11中的内存顺序消耗使用量)。可能打破依赖顺序的假设包括对加载的值预测。如果这就是您所想的,那么也许是的,因为在存储提交到L1d缓存之前,原子RMW不能退役,但是存储需要是非投机性的,所以在原子RMW的存储端可以提交之前,所有先前的推测都必须得到确认。
atomic_signal_fence(seq_cst)
would fix it for x86, but weakly-ordered ISAs could still commit the stores to L1d cache out of order unless you use atomic_thread_fence(release)
between them, or make the second one a release
store. Then you're left with the question of reordering dependent loads. ISO C11 I'm pretty sure doesn't promise anything unless you use a release
store and at least a consume
load, but the only real implementation that could even possibly be "weird" with release stores but relaxed loads is DEC Alpha. And then yeah, maybe not in practice with RMW.
ATOM_SIGNAL_FIVEN(Seq_Cst)可以为x86修复它,但是弱排序的ISA仍然可以无序地将存储提交到L1d缓存,除非您在它们之间使用ATOM_THREAD_FRANSE(RELEASE),或者使第二个存储成为释放存储。然后,您将面临重新排序依赖加载的问题。我非常肯定ISO C11并不承诺任何事情,除非你使用一个发布商店和至少一个消费加载,但唯一真正的实现,甚至可能是“奇怪”的发布商店,但轻松的加载是DEC Alpha。然后是的,也许不是在RMW的练习中。
我是一名优秀的程序员,十分优秀!