gpt4 book ai didi

ios - 什么是 dispatch_atomic_maximally_synchronizing_barrier();意思是?

转载 作者:行者123 更新时间:2023-11-29 02:25:43 33 4
gpt4 key购买 nike

最近我读了blog来自 mikeash,它讲述了 dispatch_once 的详细实现。我还在 macosforge 中获得了它的源代码

我理解除了这一行之外的大部分代码:

dispatch_atomic_maximally_synchronizing_barrier();

它是一个宏并且定义了:

#define dispatch_atomic_maximally_synchronizing_barrier() \
do { unsigned long _clbr; __asm__ __volatile__( \
"cpuid" \
: "=a" (_clbr) : "0" (0) : "rbx", "rcx", "rdx", "cc", "memory" \
); } while(0)

我知道它是用来确保它“击败对等 CPU 的推测性预读”,但我不知道 cpuid 和后面的单词。我对汇编语言知之甚少。

有人可以为我详细说明吗?非常感谢。

最佳答案

libdispatch 源代码几乎解释了它。

http://opensource.apple.com/source/libdispatch/libdispatch-442.1.4/src/shims/atomic.h

// see comment in dispatch_once.c
#define dispatch_atomic_maximally_synchronizing_barrier() \

http://opensource.apple.com/source/libdispatch/libdispatch-442.1.4/src/once.c

// The next barrier must be long and strong.
//
// The scenario: SMP systems with weakly ordered memory models
// and aggressive out-of-order instruction execution.
//
// The problem:
//
// The dispatch_once*() wrapper macro causes the callee's
// instruction stream to look like this (pseudo-RISC):
//
// load r5, pred-addr
// cmpi r5, -1
// beq 1f
// call dispatch_once*()
// 1f:
// load r6, data-addr
//
// May be re-ordered like so:
//
// load r6, data-addr
// load r5, pred-addr
// cmpi r5, -1
// beq 1f
// call dispatch_once*()
// 1f:
//
// Normally, a barrier on the read side is used to workaround
// the weakly ordered memory model. But barriers are expensive
// and we only need to synchronize once! After func(ctxt)
// completes, the predicate will be marked as "done" and the
// branch predictor will correctly skip the call to
// dispatch_once*().
//
// A far faster alternative solution: Defeat the speculative
// read-ahead of peer CPUs.
//
// Modern architectures will throw away speculative results
// once a branch mis-prediction occurs. Therefore, if we can
// ensure that the predicate is not marked as being complete
// until long after the last store by func(ctxt), then we have
// defeated the read-ahead of peer CPUs.
//
// In other words, the last "store" by func(ctxt) must complete
// and then N cycles must elapse before ~0l is stored to *val.
// The value of N is whatever is sufficient to defeat the
// read-ahead mechanism of peer CPUs.
//
// On some CPUs, the most fully synchronizing instruction might
// need to be issued.

dispatch_atomic_maximally_synchronizing_barrier();

对于 x86_64 和 i386 架构,它使用 cpuid 指令来刷新指令管道,如 @Michael 提到的。 cpuid 是序列化指令以防止内存重新排序。以及其他架构的 __sync_synchronize。

https://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Atomic-Builtins.html

__sync_synchronize (...)
This builtin issues a full memory barrier.

these builtins are considered a full barrier. That is, no memory operand will be moved across the operation, either forward or backward. Further, instructions will be issued as necessary to prevent the processor from speculating loads across the operation and from queuing stores after the operation.

关于ios - 什么是 dispatch_atomic_maximally_synchronizing_barrier();意思是?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27562334/

33 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com