gpt4 book ai didi

c++ - atomic_thread_fence(memory_order_seq_cst)是否具有完整内存屏障的语义?

转载 作者:可可西里 更新时间:2023-11-01 18:37:44 25 4
gpt4 key购买 nike

完全/通用内存屏障是指相对于系统其他组件而言,屏障之前指定的所有LOAD和STORE操作似乎都发生在屏障之后指定的所有LOAD和STORE操作之前的情形。

根据cppreferencememory_order_seq_cst等于memory_order_acq_rel加上在这样标记的所有操作上的单个总修改顺序。但是据我所知,C++ 11中的获取或释放围栏都不会强制执行#StoreLoad(存储后加载)排序。释放栅栏要求任何后续的写操作都不能对先前的读/写进行重新排序;获取栅栏要求后续的读/写操作不能与先前的任何读操作重新排序。如果我错了,请纠正我;)

举个例子

atomic<int> x;
atomic<int> y;

y.store(1, memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)
x.load(memory_order_relaxed); //(3)

优化编译器是否允许将指令(3)重新排序为(1)之前的代码,以使其有效,如下所示:
x.load(memory_order_relaxed);                //(3)
y.store(1, memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)

如果这是有效的转换,则证明 atomic_thread_fence(memory_order_seq_cst)不一定包含完整障碍所具有的语义。

最佳答案

atomic_thread_fence(memory_order_seq_cst)始终会生成完整屏障。

  • x86_64:MFENCE
  • PowerPC:hwsync
  • Itanuim:mf
  • ARMv7/ARMv8:dmb ish
  • MIPS64:sync

  • 最主要的是:观察线程可以简单地以不同顺序进行观察,而与观察线程中使用的围栏无关。

    Is it allowed by a optimizing compiler to reorder instruction (3) to before (1)?



    不可以,这是不允许的。但是在多线程程序全局可见的情况下,仅在以下情况下才是正确的
  • 其他线程使用相同的memory_order_seq_cst进行具有这些值
  • 的原子读取/写入操作
  • 或其他线程在load()和store()之间也使用相同的atomic_thread_fence(memory_order_seq_cst);-但这种方法通常不能保证顺序一致性,因为顺序一致性更强,可以保证

  • 编程语言C++标准工作草案2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf

    § 29.3 Order and consistency

    § 29.3 / 8

    [ Note: memory_order_seq_cst ensures sequential consistency only for a program that is free of data races and uses exclusively memory_order_seq_cst operations. Any use of weaker ordering will invalidate this guarantee unless extreme care is used. In particular, memory_order_seq_cst fences ensure a total order only for the fences themselves. Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering specifications. — end note ]



    如何将其映射到汇编器:

    案例1:
    atomic<int> x, y

    y.store(1, memory_order_relaxed); //(1)
    atomic_thread_fence(memory_order_seq_cst); //(2)
    x.load(memory_order_relaxed); //(3)

    这段代码 并不总是等同于Case-2的含义,但是这段代码在STORE&LOAD之间以及LOAD和STORE两者都使用memory_order_seq_cst时产生相同的指令-这是顺序一致性,可防止StoreLoad重新排序, 情况2 :
    atomic<int> x, y;

    y.store(1, memory_order_seq_cst); //(1)

    x.load(memory_order_seq_cst); //(3)

    有一些注意事项:
  • 它可能会添加重复的指令(如以下示例中的MIPS64)
  • 或可以其他指令的形式使用类似的操作:
  • 与x86_64的替代3/4映射中一样,LOCK -prefix完全按照MFENCE刷新Store-Buffer以防止StoreLoad重新排序
  • 或ARMv8-我们知道DMB ISH是完全屏障,可防止StoreLoad重新排序:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CHDGACJD.html

  • Guide for ARMv8-A

    Table 13.1. Barrier parameters

    ISH Any - Any

    Any - Any This means that both loads and stores must complete before the barrier. Both loads and stores that appear after the barrier in program order must wait for the barrier to complete.



    可以通过这两个指令之间的附加指令来防止对这两个指令进行重新排序。正如我们看到的,第一个STORE(seq_cst)和下一个LOAD(seq_cst)之间的生成指令与FENCE(seq_cst)(atomic_thread_fence(memory_order_seq_cst))相同

    将C/C++ 11的memory_order_seq_cst映射到不同的CPU体系结构:load()store()atomic_thread_fence():

    注意atomic_thread_fence(memory_order_seq_cst); 始终会生成完整屏障:
  • x86_64: STORE- MOV (into memory), MFENCE ,LOAD- MOV (from memory),fence- MFENCE
  • x86_64-alt:存储-MOV (into memory),LOAD- MFENCE ,MOV (from memory),栅栏-MFENCE
  • x86_64-alt3:存储- (LOCK) XCHG ,加载-MOV (from memory),栅栏-MFENCE-全屏障
  • x86_64-alt4:存储-MOV (into memory),LOAD- LOCK XADD(0),栅栏-MFENCE-全屏障
  • PowerPC: STORE- hwsync; st ,LOAD- hwsync; ld; cmp; bc; isync ,fence- hwsync
  • Itanium:STORE- st.rel; mf ,LOAD- ld.acq,fence- mf
  • ARMv7: STORE- dmb ish; str; dmb ish ,LOAD- ldr; dmb ish,fence- dmb ish
  • ARMv7-alt:存储-dmb ish; str,LOAD- dmb ish; ldr; dmb ish,fence- dmb ish
  • ARMv8(AArch32):存储-STL,LOAD- LDA,栅栏-DMB ISH-全屏障
  • ARMv8(AArch64):存储-STLR,LOAD- LDAR,栅栏-DMB ISH-全屏障
  • MIPS64: STORE- sync; sw; sync; ,LOAD- sync; lw; sync; ,fence- sync

  • 描述了C/C++ 11语义到不同CPU体系结构的所有映射,分别用于:load(),store(),atomic_thread_fence():http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

    因为Sequential-Consistency阻止了StoreLoad的重新排序,并且因为Sequential-Consistency(store(memory_order_seq_cst)和next load(memory_order_seq_cst))之间生成的指令与atomic_thread_fence(memory_order_seq_cst)相同,所以atomic_thread_fence(memory_order_seq_cst)阻止了StoreLoad的重新排序。

    关于c++ - atomic_thread_fence(memory_order_seq_cst)是否具有完整内存屏障的语义?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25478029/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com