gpt4 book ai didi

c - 如何在 C 中获取 simd vector 的唯一元素数

转载 作者:行者123 更新时间:2023-12-05 04:23:15 25 4
gpt4 key购买 nike

是否有一种无需转换为数组即可快速计算 simd vector (AVX 和任何 SSE)中唯一元素数量的方法?我想在特定的 bruteforcer 中使用它作为优化,所以我希望它尽可能快。

目前我在做:

// count the number of unique elements
int uniqueCount(v16n a) {
alignas(16) unsigned char v[16];
_mm_store_si128((v16n*)v, a);

int count = 1;
for(int i = 1; i < 16; i++) {
int j;
for(j = 0; j < i; j++)
if(v[i] == v[j])
break;

if(i == j) count++;
}

return count;
}

最佳答案

这是一种可能的实现方式。该代码需要 SSSE3、SSE 4.1,并在可用时略微受益于 AVX2。

// Count unique bytes in the vector
size_t countUniqueBytes( __m128i vec )
{
size_t result = 1;
// Accumulator for the bytes encountered so far, initialize with broadcasted first byte
#ifdef __AVX2__
__m128i partial = _mm_broadcastb_epi8( vec );
#else
__m128i partial = _mm_shuffle_epi8( vec, _mm_setzero_si128() );
#endif
// Permutation vector to broadcast these bytes
const __m128i one = _mm_set1_epi8( 1 );
__m128i perm = one;

// If you use GCC, uncomment following line and benchmark, may or may not help:
// #pragma GCC unroll 1
for( int i = 1; i < 16; i++ )
{
// Broadcast i-th byte from the source vector
__m128i bc = _mm_shuffle_epi8( vec, perm );
perm = _mm_add_epi8( perm, one );
// Compare bytes with the partial vector
__m128i eq = _mm_cmpeq_epi8( bc, partial );
// Append current byte to the partial vector
partial = _mm_alignr_epi8( bc, partial, 1 );
// Increment result if the byte was not yet in the partial vector
// Compilers are smart enough to do that with `sete` instruction, no branches
int isUnique = _mm_testz_si128( eq, eq );
result += ( isUnique ? (size_t)1 : (size_t)0 );
}
return result;
}

关于c - 如何在 C 中获取 simd vector 的唯一元素数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73756653/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com