gpt4 book ai didi

c - 如何比较上位 double 浮点元素与SSE

转载 作者:太空宇宙 更新时间:2023-11-04 00:04:55 24 4
gpt4 key购买 nike

我正在寻找一种方法来比较两个 __m128d 变量之间的上半部分。所以我查找https://software.intel.com/sites/landingpage/IntrinsicsGuide/对于相对内在函数。

但我只能找到一些比较两个变量之间的下部的内在函数,例如,_mm_comieq_sd

我想知道为什么没有比较上半部分的内在函数,更重要的是,如何比较两个 __m128d 变量之间的上半部分?


更新:

代码是这样的

    j0     =  jprev0;
j1 = jprev1;

t_0 = p_i_x - pj_x_0;
t_1 = p_i_x - pj_x_1;
r2_0 = t_0 * t_0;
r2_1 = t_1 * t_1;

t_0 = p_i_y - pj_y_0;
t_1 = p_i_y - pj_y_1;
r2_0 += t_0 * t_0;
r2_1 += t_1 * t_1;

t_0 = p_i_z - pj_z_0;
t_1 = p_i_z - pj_z_1;
r2_0 += t_0 * t_0;
r2_1 += t_1 * t_1;

#if NAMD_ComputeNonbonded_SortAtoms != 0 && ( 0 PAIR ( + 1 ) )
sortEntry0 = sortValues + g;
sortEntry1 = sortValues + g + 1;
jprev0 = sortEntry0->index;
jprev1 = sortEntry1->index;
#else
jprev0 = glist[g ];
jprev1 = glist[g+1];
#endif

pj_x_0 = p_1[jprev0].position.x;
pj_x_1 = p_1[jprev1].position.x;
pj_y_0 = p_1[jprev0].position.y;
pj_y_1 = p_1[jprev1].position.y;
pj_z_0 = p_1[jprev0].position.z;
pj_z_1 = p_1[jprev1].position.z;

// want to use sse to compare those
bool test0 = ( r2_0 < groupplcutoff2 );
bool test1 = ( r2_1 < groupplcutoff2 );

//removing ifs benefits on many architectures
//as the extra stores will only warm the cache up
goodglist [ hu ] = j0;
goodglist [ hu + test0 ] = j1;

hu += test0 + test1;

我正在尝试用 SSE 重写它。

最佳答案

你在问如何在已经比较了下半部分之后再比较上半部分。

进行比较的 SIMD 方法是使用压缩比较指令,例如 __m128d _mm_cmplt_pd (__m128d a, __m128d b),它会生成掩码作为输出而不是设置标志。 AVX 有一个改进的 vcmppd/vcmpps,它有更多的比较运算符选择,您可以将其作为第三个参数传递。 _mm_cmp_pd (__m128d a, __m128d b, const int imm8)

const __m128d groupplcutoff2_vec = _mm_broadcastsd_pd(groupplcutoff2);
// should emit SSE3 movddup like _mm_movedup_pd() would.

__m128d r2 = ...;

// bool test0 = ( r2_0 < groupplcutoff2 );
// bool test1 = ( r2_1 < groupplcutoff2 );
__m128d ltvec = _mm_cmplt_pd(r2, groupplcutoff2_vec);
int ltmask = _mm_movemask_pd(ltvec);

bool test0 = ltmask & 1;
// bool test1 = ltmask & 2;

// assuming j is double. I'm not sure from your code, it might be int.
// and you're right, doing both stores unconditionally is prob. fastest, if your code isn't heavy on stores.
// goodglist [ hu ] = j0;
_mm_store_sd (goodglist [ hu ], j);
// goodglist [ hu + test0 ] = j1;
_mm_storeh_pd(goodglist [ hu + test0 ], j);
// don't try to use non-AVX _mm_maskmoveu_si128, it's like movnt. And doesn't do exactly what this needs, anyway, without shuffling j and ltvec.

// hu += test0 + test1;
hu += _popcnt32(ltmask); // Nehalem or later. Check the popcnt CPUID flag

popcnt 技巧与 AVX 一样有效(4 个 double 值打包在一个 ymm 寄存器中)。 Packed-compare -> movemask 和使用位操作是一个需要牢记的有用技巧。

关于c - 如何比较上位 double 浮点元素与SSE,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28101883/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com