gpt4 book ai didi

cuda - CUDA 中未对齐的地址

转载 作者:行者123 更新时间:2023-12-01 04:27:18 24 4
gpt4 key购买 nike

谁能告诉我 CUDA 内核中的以下代码有什么问题:

__constant__ unsigned char MT[256] = {
0xde, 0x6f, 0x6f, 0xb1, 0xde, 0x6f, 0x6f, 0xb1, 0x91, 0xc5, 0xc5, 0x54, 0x91, 0xc5, 0xc5, 0x54,....};

typedef unsinged int U32;

__global__ void Kernel (unsigned int *PT, unsigned int *CT, unsigned int *rk)
{

long int i;
__shared__ unsigned char sh_MT[256];

for (i = 0; i < 64; i += 4)
((U32*)sh_MT)[threadIdx.x + i] = ((U32*)MT)[threadIdx.x + i];

__shared__ unsigned int sh_rkey[4];
__shared__ unsigned int sh_state_pl[4];
__shared__ unsigned int sh_state_ct[4];

sh_state_pl[threadIdx.x] = PT[threadIdx.x];
sh_rkey[threadIdx.x] = rk[threadIdx.x];
__syncthreads();


sh_state_ct[threadIdx.x] = ((U32*)sh_MT)[sh_state_pl[threadIdx.x]]^\
((U32*)(sh_MT+3))[((sh_state_pl[(1 + threadIdx.x) % 4] >> 8) & 0xff)] ^ \
((U32*)(sh_MT+2))[((sh_state_pl[(2 + threadIdx.x) % 4] >> 16) & 0xff)] ^\
((U32*)(sh_MT+1))[((sh_state_pl[(3 + threadIdx.x) % 4] >> 24) & 0xff )];


CT[threadIdx.x] = sh_state_ct[threadIdx.x];
}

在这行代码中,
((U32*)(sh_MT+3))......

CUDA 调试器给了我错误信息:
地址未对齐

我该如何解决这个错误?

我在 MVSC 中使用 CUDA 7,我使用 1 个块和 4 个线程来执行内核函数,如下所示:
__device__ unsigned int *state;
__device__ unsigned int *key;
__device__ unsigned int *ct;
.
.
main()
{
cudaMalloc((void**)&state, 16);
cudaMalloc((void**)&ct, 16);
cudaMalloc((void**)&key, 16);
//cudamemcpy(copy some values to => state , ct, key);
Kernel << <1, 4 >> >(state, ct, key);
}

请记住,我无法更改我的“MT 表”类型。
预先感谢您的任何建议或回答。

最佳答案

错误消息的意思是指针未与处理器所需的边界对齐。
来自 CUDA Programming Guide, section 5.3.2 :

Global memory instructions support reading or writing words of size equal to 1, 2, 4, 8, or 16 bytes. Any access (via a variable or a pointer) to data residing in global memory compiles to a single global memory instruction if and only if the size of the data type is 1, 2, 4, 8, or 16 bytes and the data is naturally aligned (i.e., its address is a multiple of that size).


这就是调试器试图告诉您的:基本上,您不应该从未在 32 位边界对齐的地址取消引用指向 32 位值的指针。
你可以做 (U32*)(sh_MT)(U32*)(sh_MT+4)很好,但不是 (U32*)(sh_MT+3)或诸如此类。
您可能必须分别读取字节并将它们连接在一起。

关于cuda - CUDA 中未对齐的地址,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37323053/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com