gpt4 book ai didi

c++ - 与 ARM Neon vtbx 的字节序混淆

转载 作者:行者123 更新时间:2023-11-28 05:17:03 30 4
gpt4 key购买 nike

我有一个 16 字节的置换掩码 uint8_t[16] 和一个 16 字节的数据数组 uint32_t[4]。我想像这样使用 vtbl 来“洗牌”这个数据数组:

          0   1   2   3   4   5   6   7    8   9    A    B    C    D    E    F
Data ||0x0,0x0,0x1,0x2|0x0,0x3,0x0,0x4||0x5,0x6, 0x7, 0x8| 0x0, 0x0, 0x0, 0x9||

SMask ||0x2,0x3,0x5,0x6|0x7,0x8,0x9,0xA||0xB,0xF,0x10,0x10|0x10,0x10,0x10,0x10||

Result ||0x1,0x2,0x3,0x0|0x4,0x5,0x6,0x7||0x8,0x9, 0x0, 0x0| 0x0, 0x0, 0x0, 0x0||

到目前为止,这是我的代码:

#include <iostream>
#include <arm_neon.h>

inline uint8x16_t Shuffle(const uint8x16_t & src, const uint8x16_t & shuffle) {
return vcombine_u8(
vtbl2_u8(
(const uint8x8x2_t &)src,
vget_low_u8(shuffle)
),
vtbl2_u8(
(const uint8x8x2_t &)src,
vget_high_u8(shuffle)
)
);
}

int main() {
uint32_t* data32 = new uint32_t[4];
data32[0] = 258; // [0x00 0x00 0x01 0x02]
data32[1] = 196612; // [0x00 0x03 0x00 0x04]
data32[2] = 84281096; // [0x05 0x06 0x07 0x08]
data32[3] = 9; // [0x00 0x00 0x00 0x09]
/*load structure*/
uint32x4_t data32Vec = vld1q_u32(data32);

uint8_t* sMask = new uint8_t[16];
sMask[0] = 2;
sMask[1] = 3;
sMask[2] = 5;
sMask[3] = 6;
sMask[4] = 7;
sMask[5] = 8;
sMask[6] = 9;
sMask[7] = 10;
sMask[8] = 11;
sMask[9] = 15;
sMask[10] = 16;
sMask[11] = 16;
sMask[12] = 16;
sMask[13] = 16;
sMask[14] = 16;
sMask[15] = 16;
/*load permutationmask into vector register*/
uint8x16_t shuffleMask = vld1q_u8(sMask);

uint8_t* comprData = new uint8_t[16];
/*shuffle the data with the mask and store it into an uint8_t[16]*/
vst1q_u8(comprData, Shuffle(vreinterpretq_u8_u32(data32Vec),shuffleMask));
for(int i = 0; i < 16; ++i) {
std::cout << (unsigned)comprData[i] << " " ;
}
std::cout << std::endl;
delete[] comprData;
delete[] sMask;
delete[] data32;
return 0;
}

输出如下所示:

0   0   0   3   0   8   7   6   5   0   0   0   0   0   0   0

应该是这样的:

1   2   3   0   4   5   6   7   8   9   0   0   0   0   0   0

我认为这与字节序有关,但看不出问题所在。有没有人有提示?

我根据 ErmIg 的回答更新了代码。主要问题是,我混淆了 vtbx 和 vtbl。

真诚的

最佳答案

可能对您有帮助(我使用这些函数在 Arm NEON 的 vector 中随机排列字节):

    inline uint8x16_t Shuffle(const uint8x16_t & src, const uint8x16_t & shuffle)
{
return vcombine_u8(
vtbl2_u8((const uint8x8x2_t &)src, vget_low_u8(shuffle)),
vtbl2_u8((const uint8x8x2_t &)src, vget_high_u8(shuffle)));
}

inline uint8x16_t Shuffle(const uint8x16x2_t & src, const uint8x16_t & shuffle)
{
return vcombine_u8(
vtbl4_u8((const uint8x8x4_t &)src, vget_low_u8(shuffle)),
vtbl4_u8((const uint8x8x4_t &)src, vget_high_u8(shuffle)));
}

关于c++ - 与 ARM Neon vtbx 的字节序混淆,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42418140/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com