gpt4 book ai didi

simd - 使用 ARM-v8 Neon SIMD 将 ascii 字符串打包成 7 位二进制 blob

转载 作者:行者123 更新时间:2023-12-05 09:25:53 31 4
gpt4 key购买 nike

关注我的 x86 question ,我想知道如何在 Arm-v8 上有效地矢量化以下代码:


static inline uint64_t Compress8x7bit(uint64_t x) {
x = ((x & 0x7F007F007F007F00) >> 1) | (x & 0x007F007F007F007F);
x = ((x & 0x3FFF00003FFF0000) >> 2) | (x & 0x00003FFF00003FFF);
uint64_t res = ((x & 0x0FFFFFFF00000000) >> 4) | (x & 0x000000000FFFFFFF);

/* does the following:
uint64_t res = (x & 0xFF);
for (unsigned i = 1; i <= 7; ++i) {
x >>= 1;
res |= (x & (0x7FUL << 7 * i));
}
*/
return res;
}

void ascii_pack2(const char* ascii, size_t len, uint8_t* bin) {
uint64_t val;
const char* end = ascii + len;

while (ascii + 8 <= end) {
memcpy(&val, ascii, 8);
val = Compress8x7bit(val);
memcpy(bin, &val, 8);
bin += 7;
ascii += 8;
}

// epilog - we do not pack since we have less than 8 bytes.
while (ascii < end) {
*bin++ = *ascii++;
}
}

最佳答案

随着变量的移动,问题变得非常简单:

          MSB                                                            LSB
a0 = 0AAAAAAA'0bBBBBBB'0ccCCCCC'0dddDDDD'0eeeeEEE'0fffffFF'0ggggggG'0hhhhhhh
a1 = AAAAAAA0'BBBBBB00'CCCCC000'DDDD0000'EEE00000'FF000000'G0000000'00000000 = a0 << {1,2,3,4,5,6,7,8}
a2 = 00000000'0000000b'000000cc'00000ddd'0000eeee'000fffff'00gggggg'0hhhhhhh = a0 >> {7,6,5,4,3,2,1,0}
a3 = 00000000'AAAAAAA0'BBBBBB00'CCCCC000'DDDD0000'EEE00000'FF000000'G0000000 = ext(a1, a1, 1);
a4 = 00000000'AAAAAAAb'BBBBBBcc'CCCCCddd'DDDDeeee'EEEfffff'FFgggggg'Ghhhhhhh = a2 | a3

auto d1 = vshl_s8(d0, vcreate_s8(0x0102030405060708ull));
auto d2 = vshl_s8(d0, vcreate_s8(0xf9fafbfcfdfeff00ull));
auto d3 = vext_u8(d1,d1,1);
return vorr_u8(d2,d3);

关于simd - 使用 ARM-v8 Neon SIMD 将 ascii 字符串打包成 7 位二进制 blob,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74846499/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com