c++ - CMPXCHG16B 正确吗？-6ren

c++ - CMPXCHG16B 正确吗？

转载作者：可可西里更新时间：2023-11-01 17:08:29

41

4

虽然我不确定为什么，但这似乎并不完全正确。建议很好，因为 CMPXCHG16B 的文档非常少(我没有任何英特尔手册...)

template<>
inline bool cas(volatile types::uint128_t *src, types::uint128_t cmp, types::uint128_t with)
{
    /*
    Description:
     The CMPXCHG16B instruction compares the 128-bit value in the RDX:RAX and RCX:RBX registers 
     with a 128-bit memory location. If the values are equal, the zero flag (ZF) is set, 
     and the RCX:RBX value is copied to the memory location. 
     Otherwise, the ZF flag is cleared, and the memory value is copied to RDX:RAX.
     */
    uint64_t * cmpP = (uint64_t*)&cmp;
    uint64_t * withP = (uint64_t*)&with;
    unsigned char result = 0;
    __asm__ __volatile__ (
    "LOCK; CMPXCHG16B %1\n\t"
    "SETZ %b0\n\t"
    : "=q"(result)  /* output */ 
    : "m"(*src), /* input */
      //what to compare against
      "rax"( ((uint64_t) (cmpP[1])) ), //lower bits
      "rdx"( ((uint64_t) (cmpP[0])) ),//upper bits
      //what to replace it with if it was equal
      "rbx"( ((uint64_t) (withP[1])) ), //lower bits
      "rcx"( ((uint64_t) (withP[0]) ) )//upper bits
    : "memory", "cc", "rax", "rdx", "rbx","rcx" /* clobbered items */
    );
    return result;
}

运行示例时，我得到的是 0，而它应该是 1。有什么想法吗？

最佳答案

注意到一些问题，

(1) 主要问题是约束，“rax”并没有像它看起来那样做，而是第一个字符“r”让 gcc 使用任何寄存器。

(2) 不确定您的存储类型::uint128_t，但假设 x86 平台采用标准的小尾数法，那么高位和低位双字也会交换。

(3) 获取某物的地址并将其转换为其他物可能会破坏别名规则。取决于你的 types::uint128_t 是如何定义的，这是否是一个问题(如果它是两个 uint64_t 的结构则很好)。假设不违反别名规则，带有 -O2 的 GCC 将进行优化。

(4) *src 确实应该被标记为输出，而不是指定内存破坏。但这实际上更多的是性能问题而不是正确性问题。类似地，rbx 和 rcx 不需要指定为 clobbered。

这是一个有效的版本，

#include <stdint.h>

namespace types
{
    // alternative: union with  unsigned __int128
    struct uint128_t
    {
        uint64_t lo;
        uint64_t hi;
    }
    __attribute__ (( __aligned__( 16 ) ));
}

template< class T > inline bool cas( volatile T * src, T cmp, T with );

template<> inline bool cas( volatile types::uint128_t * src, types::uint128_t cmp, types::uint128_t with )
{
    // cmp can be by reference so the caller's value is updated on failure.

    // suggestion: use __sync_bool_compare_and_swap and compile with -mcx16 instead of inline asm
    bool result;
    __asm__ __volatile__
    (
        "lock cmpxchg16b %1\n\t"
        "setz %0"       // on gcc6 and later, use a flag output constraint instead
        : "=q" ( result )
        , "+m" ( *src )
        , "+d" ( cmp.hi )
        , "+a" ( cmp.lo )
        : "c" ( with.hi )
        , "b" ( with.lo )
        : "cc", "memory" // compile-time memory barrier.  Omit if you want memory_order_relaxed compile-time ordering.
    );
    return result;
}

int main()
{
    using namespace types;
    uint128_t test = { 0xdecafbad, 0xfeedbeef };
    uint128_t cmp = test;
    uint128_t with = { 0x55555555, 0xaaaaaaaa };
    return ! cas( & test, cmp, with );
}

关于c++ - CMPXCHG16B 正确吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4825400/

41

4

0

文章推荐： ios - 无需堆叠的推送 View Controller

文章推荐： c++ - 如何将 char 数组转换为单个整数？

文章推荐： ios - Objective-C @property宏参数的使用

文章推荐： c++ - STL BigInt 类实现

CMPXCHG 和关键部分实现
CMPXCHG 语句的工作原理如下: CMPXCHG (common, old, new): int temp temp <- common if common = old t
assembly - 在非缓存内存上锁定 CMPXCHG？
一个简单的问题:在非缓存内存(即页表中标记为非缓存的页面)上是否可以使用 LOCK CMPXCHG？最佳答案 The content of this answer closely resembles
assembly - x86_64 - cmpxchg。返回值
我正在阅读英特尔手册，卷。 2A. Compares the value in the AL, AX, EAX, or RAX register with the first operand (des
assembly - 为什么我们在 CMPXCHG 之前需要锁定前缀
这个问题已经有答案了: Is x86 CMPXCHG atomic, if so why does it need LOCK? (3 个回答) 已关闭 4 年前。为什么在intel架构中CMPXCH
assembly - LOCK CMPXCHG 的线程执行速度
我编写了一个多线程应用程序来对 LOCK CMPXCHG (x86 ASM) 的运行速度进行基准测试。在我的机器(双核 - Core 2)上，有 2 个线程运行并访问同一变量，我每秒可以执行大约 4
c++ - “lock cmpxchg”如何在汇编中工作？
我遇到了这个旧的(4.8.3之前的GCC-错误60272)错误报告https://gcc.gnu.org/ml/gcc-bugs/2014-02/msg01951.html。现在已修复。但是我对此有疑
assembly - CMPXCHG——忽略 ZF 标志安全吗？
cmpxchg 的操作伪代码如下(Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2A: Instructi
gcc - 64 位整数的 cmpxchg 示例
我在 i686 架构中使用 cmpxchg(比较和交换)进行 32 位比较和交换，如下所示。 (编者注:最初的 32 位示例有问题，但问题不在于它。我相信这个版本是安全的，作为奖励，它也可以为 x86
linux - linux 提供的 cmpxchg 调用是否会崩溃？
我正在使用 linux 内核 (SLES11-SP2) 提供的 cmpxchg()令人 panic 。它崩溃的确切点在 2005 行: if (cmpxchg(var, old
linux - 确定 linux 端口是否支持硬件中的 cmpxchg
我正在编写一个 Linux 内核补丁，除了修复一些语义问题外，它使用 cmpxchg 来加速一些情况，但是我注意到某些架构只支持 xchg 而不是 cmpxchg，我如何在编译时确定是否正在编译内核的
gcc - 使用 cmpxchg 的 x86 自旋锁
我是使用 gcc 内联汇编的新手，并且想知道在 x86 多核计算机上是否可以将自旋锁(无竞争条件)实现为(使用 AT&T 语法): spin_lock:mov 0 eaxlock cmpxchg 1
c++ - 错误 : invalid instruction suffix for `cmpxchg'
我正在尝试编译另一个项目的代码，但我一直收到错误:“cmpxchg”的无效指令后缀。错误一直指向的代码行是: inline bool CAS(long *ptr, long oldv, long ne
c++ - 内联 asm 中 cmpxchg 的段错误
我正在使用内联汇编编写 my_simple_mutex。下面被注释掉的代码部分工作正常，但是，带有 cmpxchg 的版本以段错误终止。我在 cygwin 中使用 g++ 4.8.2。 void si
linux - gcc __sync_bool_compare_and_swap 和 cmpxchg 之间有什么区别？
为了使用cas，gcc提供了一些有用的函数比如 __sync_bool_compare_and_swap 但是我们也可以使用像cmpxchg这样的asm代码 bool ret; __asm__ __
assembly - cmpxchg 是否会在失败时写入目标缓存行？如果不是，它是否比自旋锁的 xchg 更好？
我假设简单的自旋锁不会进入操作系统等待这个问题的目的。我看到简单的自旋锁通常使用 lock xchg 来实现。或 lock bts而不是 lock cmpxchg . 但不是cmpxchg如果期望不
assembly - ASM : lock cmpxchg dest, src 中存在非法指令
我一直在摆弄一些 x86 程序集，因为它出现在我的许多类(class)中。特别是，我想将比较和交换(CAS)公开为用户函数。这是为了我可以实现自己的锁。我在 Intel CPU 上使用 Linux
assembly - 使用 x86 CMPXCHG 比较和交换的 D 函数
我正在寻找一个呈现高水平的函数D接口(interface)atomic CAS在英特尔 x86 上。我知道我可以使用内联 ASM 来做到这一点(如果需要的话我会这样做)，但如果可以的话，我宁愿从其他
c++ - WORD 的 cmpxchg 比 BYTE 快
昨天我发布了this question关于如何编写快速自旋锁。感谢 Cory Nelson，我似乎找到了一种优于我问题中讨论的其他方法的方法。我使用 CMPXCHG 指令来检查锁是否为 0 从而释放。
c - 在 8 位字段上使用 CMPXCHG 指令在任何方面都比在 32 位字段上更糟糕吗？
我想问一下，在 8 位内存字段上使用 CMPXCHG 指令是否在任何方面都比在 32 位字段上使用它更糟糕。我正在使用 C11 stdatomic.h 来实现几个同步方法。最佳答案不，lock
c++ - 为什么 fetch_add 使用锁定前缀而 fetch_and 在 boost atomics 中使用 cmpxchg
我注意到在 fetch_add 的 boost::atomics 库 x86 实现(其中一个不使用编译器内部函数)中使用 add指令 lock前缀: static BOOST_FORCEINLINE

首页

博学

6Ren·AI

商城

c++ - CMPXCHG16B 正确吗？