gpt4 book ai didi

c++ - CMPXCHG16B 正确吗?

转载 作者:可可西里 更新时间:2023-11-01 17:08:29 41 4
gpt4 key购买 nike

虽然我不确定为什么,但这似乎并不完全正确。建议很好,因为 CMPXCHG16B 的文档非常少(我没有任何英特尔手册...)

template<>
inline bool cas(volatile types::uint128_t *src, types::uint128_t cmp, types::uint128_t with)
{
/*
Description:
The CMPXCHG16B instruction compares the 128-bit value in the RDX:RAX and RCX:RBX registers
with a 128-bit memory location. If the values are equal, the zero flag (ZF) is set,
and the RCX:RBX value is copied to the memory location.
Otherwise, the ZF flag is cleared, and the memory value is copied to RDX:RAX.
*/
uint64_t * cmpP = (uint64_t*)&cmp;
uint64_t * withP = (uint64_t*)&with;
unsigned char result = 0;
__asm__ __volatile__ (
"LOCK; CMPXCHG16B %1\n\t"
"SETZ %b0\n\t"
: "=q"(result) /* output */
: "m"(*src), /* input */
//what to compare against
"rax"( ((uint64_t) (cmpP[1])) ), //lower bits
"rdx"( ((uint64_t) (cmpP[0])) ),//upper bits
//what to replace it with if it was equal
"rbx"( ((uint64_t) (withP[1])) ), //lower bits
"rcx"( ((uint64_t) (withP[0]) ) )//upper bits
: "memory", "cc", "rax", "rdx", "rbx","rcx" /* clobbered items */
);
return result;
}

运行示例时,我得到的是 0,而它应该是 1。有什么想法吗?

最佳答案

注意到一些问题,

(1) 主要问题是约束,“rax”并没有像它看起来那样做,而是第一个字符“r”让 gcc 使用任何寄存器。

(2) 不确定您的存储类型::uint128_t,但假设 x86 平台采用标准的小尾数法,那么高位和低位双字也会交换。

(3) 获取某物的地址并将其转换为其他物可能会破坏别名规则。取决于你的 types::uint128_t 是如何定义的,这是否是一个问题(如果它是两个 uint64_t 的结构则很好)。假设不违反别名规则,带有 -O2 的 GCC 将进行优化。

(4) *src 确实应该被标记为输出,而不是指定内存破坏。但这实际上更多的是性能问题而不是正确性问题。类似地,rbx 和 rcx 不需要指定为 clobbered。

这是一个有效的版本,

#include <stdint.h>

namespace types
{
// alternative: union with unsigned __int128
struct uint128_t
{
uint64_t lo;
uint64_t hi;
}
__attribute__ (( __aligned__( 16 ) ));
}

template< class T > inline bool cas( volatile T * src, T cmp, T with );

template<> inline bool cas( volatile types::uint128_t * src, types::uint128_t cmp, types::uint128_t with )
{
// cmp can be by reference so the caller's value is updated on failure.

// suggestion: use __sync_bool_compare_and_swap and compile with -mcx16 instead of inline asm
bool result;
__asm__ __volatile__
(
"lock cmpxchg16b %1\n\t"
"setz %0" // on gcc6 and later, use a flag output constraint instead
: "=q" ( result )
, "+m" ( *src )
, "+d" ( cmp.hi )
, "+a" ( cmp.lo )
: "c" ( with.hi )
, "b" ( with.lo )
: "cc", "memory" // compile-time memory barrier. Omit if you want memory_order_relaxed compile-time ordering.
);
return result;
}

int main()
{
using namespace types;
uint128_t test = { 0xdecafbad, 0xfeedbeef };
uint128_t cmp = test;
uint128_t with = { 0x55555555, 0xaaaaaaaa };
return ! cas( & test, cmp, with );
}

关于c++ - CMPXCHG16B 正确吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4825400/

41 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com