- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
最近我读了blog来自 mikeash,它讲述了 dispatch_once
的详细实现。我还在 macosforge 中获得了它的源代码
我理解除了这一行之外的大部分代码:
dispatch_atomic_maximally_synchronizing_barrier();
它是一个宏并且定义了:
#define dispatch_atomic_maximally_synchronizing_barrier() \
do { unsigned long _clbr; __asm__ __volatile__( \
"cpuid" \
: "=a" (_clbr) : "0" (0) : "rbx", "rcx", "rdx", "cc", "memory" \
); } while(0)
我知道它是用来确保它“击败对等 CPU 的推测性预读”,但我不知道 cpuid
和后面的单词。我对汇编语言知之甚少。
有人可以为我详细说明吗?非常感谢。
最佳答案
libdispatch 源代码几乎解释了它。
http://opensource.apple.com/source/libdispatch/libdispatch-442.1.4/src/shims/atomic.h
// see comment in dispatch_once.c
#define dispatch_atomic_maximally_synchronizing_barrier() \
http://opensource.apple.com/source/libdispatch/libdispatch-442.1.4/src/once.c
// The next barrier must be long and strong.
//
// The scenario: SMP systems with weakly ordered memory models
// and aggressive out-of-order instruction execution.
//
// The problem:
//
// The dispatch_once*() wrapper macro causes the callee's
// instruction stream to look like this (pseudo-RISC):
//
// load r5, pred-addr
// cmpi r5, -1
// beq 1f
// call dispatch_once*()
// 1f:
// load r6, data-addr
//
// May be re-ordered like so:
//
// load r6, data-addr
// load r5, pred-addr
// cmpi r5, -1
// beq 1f
// call dispatch_once*()
// 1f:
//
// Normally, a barrier on the read side is used to workaround
// the weakly ordered memory model. But barriers are expensive
// and we only need to synchronize once! After func(ctxt)
// completes, the predicate will be marked as "done" and the
// branch predictor will correctly skip the call to
// dispatch_once*().
//
// A far faster alternative solution: Defeat the speculative
// read-ahead of peer CPUs.
//
// Modern architectures will throw away speculative results
// once a branch mis-prediction occurs. Therefore, if we can
// ensure that the predicate is not marked as being complete
// until long after the last store by func(ctxt), then we have
// defeated the read-ahead of peer CPUs.
//
// In other words, the last "store" by func(ctxt) must complete
// and then N cycles must elapse before ~0l is stored to *val.
// The value of N is whatever is sufficient to defeat the
// read-ahead mechanism of peer CPUs.
//
// On some CPUs, the most fully synchronizing instruction might
// need to be issued.
dispatch_atomic_maximally_synchronizing_barrier();
对于 x86_64 和 i386 架构,它使用 cpuid 指令来刷新指令管道,如 @Michael 提到的。 cpuid 是序列化指令以防止内存重新排序。以及其他架构的 __sync_synchronize。
https://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Atomic-Builtins.html
__sync_synchronize (...)
This builtin issues a full memory barrier.
these builtins are considered a full barrier. That is, no memory operand will be moved across the operation, either forward or backward. Further, instructions will be issued as necessary to prevent the processor from speculating loads across the operation and from queuing stores after the operation.
关于ios - 什么是 dispatch_atomic_maximally_synchronizing_barrier();意思是?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27562334/
最近我读了blog来自 mikeash,它讲述了 dispatch_once 的详细实现。我还在 macosforge 中获得了它的源代码 我理解除了这一行之外的大部分代码: dispatch_ato
我是一名优秀的程序员,十分优秀!