Memory ordering and RMW operations(内存排序和RMW操作)-6ren

Memory ordering and RMW operations(内存排序和RMW操作)

转载作者：bug小助手更新时间：2023-10-25 16:42:05

Suppose I make two relaxed modifications to two atomic objects in thread0, one per each object, and then make thread1 observe the modification that came second in thread0.
Now without memory fences, if thread1 were to try and load the first object, it might not get the newly stored value (without release on the store and an acquire on the observation), but what if it does an atomic read-modify-write operation on it?

假设我对thread0中的两个原子对象进行了两次轻松的修改，每个对象一个，然后让thread1观察到thread0中的第二个修改。现在，如果没有内存围栏，如果thread1尝试加载第一个对象，它可能不会获得新存储的值(在存储上没有释放，在观察上没有获取)，但是如果它对它执行原子的读-修改-写操作呢？

#include <stdatomic.h>
#include <stddef.h>
#include <assert.h>
int _Atomic atoInt = 0;
_Atomic _Bool atoBool= 0;

void thread0(void){
    atomic_store_explicit(&atoInt,42,memory_order_relaxed);

    atomic_signal_fence(memory_order_seq_cst); //prevent compiler reordering

    atomic_store_explicit(&atoBool,1,memory_order_relaxed);
}

void thread1(void){
    //observe the 0 => 1 transition
    while(0==(atomic_load_explicit(&atoBool,memory_order_relaxed))){}

    atomic_signal_fence(memory_order_seq_cst); //prevent compiler reordering

    //could still get the stale value because release/acquire wan't used
    assert(
        0==atomic_load_explicit(&atoInt,memory_order_relaxed)
        || 42==atomic_load_explicit(&atoInt,memory_order_relaxed)
    );

    //should hold regardless because it's an RMW?
    assert(42==atomic_fetch_sub_explicit(&atoInt,1,memory_order_relaxed));
}

Based on my understanding of how cache coherence and fences work, I believe the RMW operation must necessarily get the new value.

根据我对缓存一致性和栅栏工作原理的理解，我认为RMW操作一定会获得新的值。

Is this correct?

这样对吗？

更多回答

I don't think anything in the ISO C standard formally guarantees that an RMW would see the stored value. With compile-time reordering of the stores (since they're also both relaxed), it's super easy for the assert to fail even on a strongly-ordered machine like x86.

我认为ISO C标准中的任何内容都不能正式保证RMW会看到存储的值。通过在编译时重新排序存储(因为它们都是宽松的)，断言非常容易失败，即使在像x86这样的强有序机器上也是如此。

A more interesting computer-architecture question is whether any real or hypothetical ISA could break dependency-ordering for loads but not RMWs. (The HW behaviour that most ISAs have, which memory_order_consume was intended to expose to programmers, but the design proved impractical, so real code like Linux RCU uses effectively relaxed loads and writes code like this where the compiler can't replace the load result with a constant or otherwise break the data depency e.g. by turning it into a control dependency (branch).)

一个更有趣的计算机体系结构问题是，任何真实的或假想的ISA是否可以打破负载的依赖顺序，但不能打破RMW的依赖顺序。(大多数ISA拥有的HW行为，MEMORY_ORDER_USER旨在向程序员公开，但其设计被证明是不切实际的，所以像Linux RCU这样的实际代码使用有效的宽松加载并编写这样的代码，其中编译器不能用常量替换加载结果或以其他方式打破数据依赖，例如通过将其转换为控制依赖(分支)。)

@PeterCordes Alright. And if compile-time reordering is prevented (I guess atomic_signal_fence(memory_order_seq_cst) should do it?)?

@PeterCordes好的。如果编译时重新排序被阻止(我猜ATOM_SIGNAL_FARCH(Memory_Order_Seq_Cst)应该这样做？)？

The only real-world ISA without dependency ordering is DEC Alpha (Memory order consume usage in C11). Hypothetical things that could break dependency-ordering include value-prediction for loads. If that's what you're thinking of, then yeah perhaps, since an atomic RMW can't retire until the store commits to L1d cache, but stores need to be non-speculative, so all prior speculation has to be confirmed before the store side of an atomic RMW can commit.

现实世界中唯一没有依赖项排序的ISA是DEC Alpha(C11中的内存顺序消耗使用量)。可能打破依赖顺序的假设包括对加载的值预测。如果这就是您所想的，那么也许是的，因为在存储提交到L1d缓存之前，原子RMW不能退役，但是存储需要是非投机性的，所以在原子RMW的存储端可以提交之前，所有先前的推测都必须得到确认。

atomic_signal_fence(seq_cst) would fix it for x86, but weakly-ordered ISAs could still commit the stores to L1d cache out of order unless you use atomic_thread_fence(release) between them, or make the second one a release store. Then you're left with the question of reordering dependent loads. ISO C11 I'm pretty sure doesn't promise anything unless you use a release store and at least a consume load, but the only real implementation that could even possibly be "weird" with release stores but relaxed loads is DEC Alpha. And then yeah, maybe not in practice with RMW.

ATOM_SIGNAL_FIVEN(Seq_Cst)可以为x86修复它，但是弱排序的ISA仍然可以无序地将存储提交到L1d缓存，除非您在它们之间使用ATOM_THREAD_FRANSE(RELEASE)，或者使第二个存储成为释放存储。然后，您将面临重新排序依赖加载的问题。我非常肯定ISO C11并不承诺任何事情，除非你使用一个发布商店和至少一个消费加载，但唯一真正的实现，甚至可能是“奇怪”的发布商店，但轻松的加载是DEC Alpha。然后是的，也许不是在RMW的练习中。

优秀答案推荐

更多回答

multithreading - 顺序一致性中的 RMW 优化
在 J. Sorin 的书“内存一致性和缓存一致性入门”中，我找到了关于 SC 模型中 RMW 优化的下一段: More aggressive implementations of RMWs leve
assembly - RMW 指令对现代 x86 是否有害？
我记得在优化 x86 的速度时，通常要避免读-修改-写指令。也就是说，你应该避免像 add [rsi], 10 这样的东西。，这增加了存储在 rsi 中的内存位置.建议通常是将其拆分为读取-修改指令
c++ - memory_order_relaxed 和原子 RMW 操作
C++ 标准规定，原子上的 RMW(读-修改-写)操作将对原子变量的最新值进行操作。因此，当从多个线程并发执行时，对这些操作使用 memory_order_relaxed 不会影响 RMW 操作。我
Memory ordering and RMW operations(内存排序和RMW操作)
假设我对thread0中的两个原子对象进行了两次轻松的修改，每个对象一个，然后让thread1观察到thread0中的第二个修改。现在，如果没有内存围栏，如果thread1尝试加载第一个对象，它可能不
c++ - 原子操作传播/可见性(原子负载与原子 RMW 负载)
语境我正在写一个线程安全的 protothread/coroutine library在 C++ 中，我正在使用原子学使任务切换无锁。我希望它尽可能高效。我对原子和无锁编程有一般的了解，但我没有足够
linux - 测试和设置(或其他原子 RMW 操作)是任何体系结构上的特权指令吗？
硬件提供原子指令，如测试和设置、比较和交换、加载链接存储条件。这些是特权指令吗？也就是说，只有操作系统才能执行它们(因此需要系统调用)吗？我以为他们没有特权，可以在用户空间调用。但是http://f
x86 - 为什么原子 RMW 指令的加载部分不能将较早的存储传递到 TSO(x86) 内存一致性模型中的不相关位置？
众所周知，由于使用了写缓冲区，x86 架构没有实现顺序一致性内存模型，因此可以进行 store->load 重新排序(可以提交稍后的加载，而较早的存储仍然驻留在写缓冲区中等待提交) L1缓存)。在

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Memory ordering and RMW operations(内存排序和RMW操作)