gpt4 book ai didi

c++ - 为什么 strcmp 没有 SIMD 优化?

转载 作者:IT老高 更新时间:2023-10-28 12:33:07 26 4
gpt4 key购买 nike

我尝试在 x64 计算机上编译此程序:

#include <cstring>

int main(int argc, char* argv[])
{
return ::std::strcmp(argv[0],
"really really really really really really really really really"
"really really really really really really really really really"
"really really really really really really really really really"
"really really really really really really really really really"
"really really really really really really really really really"
"really really really really really really really really really"
"really really really really really really really really really"
"really really really really really really really really really"
"really really really really really really really long string"
);
}

我是这样编译的:

g++ -std=c++11 -msse2 -O3 -g a.cpp -o a

但是得到的反汇编是这样的:

   0x0000000000400480 <+0>:     mov    (%rsi),%rsi
0x0000000000400483 <+3>: mov $0x400628,%edi
0x0000000000400488 <+8>: mov $0x22d,%ecx
0x000000000040048d <+13>: repz cmpsb %es:(%rdi),%ds:(%rsi)
0x000000000040048f <+15>: seta %al
0x0000000000400492 <+18>: setb %dl
0x0000000000400495 <+21>: sub %edx,%eax
0x0000000000400497 <+23>: movsbl %al,%eax
0x000000000040049a <+26>: retq

为什么不使用 SIMD?我想可能是一次比较 16 个字符。我应该编写自己的 SIMD strcmp,还是出于某种原因这是一个荒谬的想法?

最佳答案

在 SSE2 实现中,编译器应如何确保在字符串末尾没有发生内存访问?它必须首先知道长度,这需要扫描字符串以查找终止的零字节。

如果您扫描字符串的长度,您已经完成了 strcmp 函数的大部分工作。因此使用 SSE2 没有任何好处。

不过,英特尔在 SSE4.2 指令集中添加了字符串处理指令。这些处理终止零字节问题。要获得关于它们的精彩文章,请阅读这篇博文:

http://www.strchr.com/strcmp_and_strlen_using_sse_4.2

关于c++ - 为什么 strcmp 没有 SIMD 优化?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26586060/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com