gpt4 book ai didi

c++ - 如何在 clang++ 中禁用矢量化?

转载 作者:搜寻专家 更新时间:2023-10-31 00:52:31 27 4
gpt4 key购买 nike

考虑以下小型搜索功能:

template <uint32_t N>
int32_t countsearch(const uint32_t *base, uint32_t needle) {
uint32_t count = 0;
#pragma clang loop vectorize(disable)
for (const uint32_t *probe = base; probe < base + N; probe++) {
if (*probe < needle)
count++;
}
return count;
}

-O2 或更高级别,clang vectorizes this search ,例如,。产生这样的代码(对于 10 个元素):

int countsearch<10u>(unsigned int const*, unsigned int):            # @int countsearch<10u>(unsigned int const*, unsigned int)
vmovd xmm0, esi
vpbroadcastd ymm0, xmm0
vpbroadcastd ymm1, dword ptr [rip + .LCPI0_0] # ymm1 = [2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648]
vpxor ymm2, ymm1, ymmword ptr [rdi]
vpxor ymm0, ymm0, ymm1
vpcmpgtd ymm0, ymm0, ymm2
cmp dword ptr [rdi + 32], esi
vpsrld ymm1, ymm0, 31
vextracti128 xmm1, ymm1, 1
vpsubd ymm0, ymm1, ymm0
vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1]
vpaddd ymm0, ymm0, ymm1
vphaddd ymm0, ymm0, ymm0
vmovd eax, xmm0
adc eax, 0
cmp dword ptr [rdi + 36], esi
adc eax, 0
vzeroupper
ret

如何在命令行或在代码中使用 #pragma 禁用此矢量化?

我尝试了以下命令行参数,没有一个能阻止向量化:

-disable-loop-vectorization 
-disable-vectorization
-fno-vectorize
-fno-tree-vectorize

我还尝试了 #pragma clang loop vectorize(disable) 如您在上面的代码中看到的那样,但没有成功。

最佳答案

关闭 SLP Vectorization :

clang++ -O2 -fno-slp-vectorize

Godbolt Link

关于c++ - 如何在 clang++ 中禁用矢量化?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51461924/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com