gpt4 book ai didi

c - 将整数数组加载到 SIMD 寄存器中

转载 作者:行者123 更新时间:2023-11-30 15:12:23 24 4
gpt4 key购买 nike

目前我正在尝试使用 SSE 将整数数组加载到 SIMD 寄存器中。我有一个对齐的 32 位整数数组 Ai,并且想要将 4 个连续元素加载到 SIMD 寄存器 Xi 中。然而,执行_mm_load_si128后存储在Xi中的值除了第一个值外都是垃圾。

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <immintrin.h>

// number has to be divisible by 4 without remainder
#define VECTOR_SIZE 8

int main() {

__attribute__((aligned (16))) int32_t *Ai = (int32_t*) malloc(VECTOR_SIZE * sizeof(int32_t));

for(int i = 0; i < VECTOR_SIZE; i++) {
Ai[i] = rand() % 100000;
}

__m128i Xi;

for(int i = 0; i < VECTOR_SIZE; i+=4) {
Xi = _mm_load_si128((__m128i*) &Ai[i]);

// show content of Xi and Ai
for(int j = 0; j < 4; j++) {
printf("Xi[%d] = %d\t Ai[%d] = %d\n", j, Xi[j], i+j, Ai[i+j]);
}
}

free(Ai);
}

这是一个示例输出:

Xi[0] = 16807    Ai[0] = 16807
Xi[1] = 50073 Ai[1] = 75249
Xi[2] = 1489217992 Ai[2] = 50073
Xi[3] = 1346391152 Ai[3] = 43658
Xi[0] = 8930 Ai[4] = 8930
Xi[1] = 27544 Ai[5] = 11272
Xi[2] = 1489217992 Ai[6] = 27544
Xi[3] = 1346391168 Ai[7] = 50878

出了什么问题?

最佳答案

当您提出示例时,您的意思可能是这样:

union {
__m128i m128;
int32_t i32[4];
} Xi;

for(int i = 0; i < VECTOR_SIZE; i+=4) {
Xi.m128 = _mm_load_si128((__m128i*) &Ai[i]);

// show content of Xi and Ai
for(int j = 0; j < 4; j++) {
printf("Xi[%d] = %d\t Ai[%d] = %d\n", j, Xi.i32[j], i+j, Ai[i+j]);
}
}

这是示例输出:

Xi[0] = 89383    Ai[0] = 89383
Xi[1] = 30886 Ai[1] = 30886
Xi[2] = 92777 Ai[2] = 92777
Xi[3] = 36915 Ai[3] = 36915
Xi[0] = 47793 Ai[4] = 47793
Xi[1] = 38335 Ai[5] = 38335
Xi[2] = 85386 Ai[6] = 85386
Xi[3] = 60492 Ai[7] = 60492

关于c - 将整数数组加载到 SIMD 寄存器中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35100708/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com