gpt4 book ai didi

c++ - 如何在CUDA中将数据从unsigned int复制到ulong4

转载 作者:行者123 更新时间:2023-11-30 01:20:10 24 4
gpt4 key购买 nike

.h文件:

#define VECTOR_SIZE 1024   

.cpp 文件:

int main ()
{
unsigned int* A;
A = new unsigned int [VECTOR_SIZE];

CopyToDevice (A);
}

.cu 文件:

void CopyToDevice (unsigned int *A)
{
ulong4 *UA
unsigned int VectorSizeUlong4 = VECTOR_SIZE / 4;
unsigned int VectorSizeBytesUlong4 = VectorSizeUlong4 * sizeof(ulong4);

cudaMalloc( (void**)&UA, VectorSizeBytesUlong4 );

// how to use cudaMemcpy to copy data from A to UA?

// I tried to do the following but it gave access violation error:
for (int i=0; i<VectorSizeUlong4; ++i)
{
UA[i].x = A[i*4 + 0];
UA[i].y = A[i*4 + 1];
UA[i].z = A[i*4 + 2];
UA[i].w = A[i*4 + 3];
}
// I also tried to copy *A to device and then work on it instead going back to CPU to access *A every time but this did not work again
}

enter image description here

最佳答案

CUDA ulong4 是一个 16 字节对齐的结构,定义为

struct __builtin_align__(16) ulong4
{
unsigned long int x, y, z, w;
};

这意味着您要用来填充 ulong4 流的四个连续 32 位无符号源整数流的大小相同。最简单的解决方案就包含在您发布的图像的文本中 - 只需将 unsigned int 指针转换(隐式或显式)到 ulong4 指针,使用 cudaMemcpy 直接在主机和设备内存上,并将生成的设备指针传递给您拥有的任何需要 ulong4 输入的内核函数。您的设备传输功能可能类似于:

ulong4* CopyToDevice (unsigned int* A)
{
ulong4 *UA, *UA_h;
size_t VectorSizeUlong4 = VECTOR_SIZE / 4;
size_t VectorSizeBytesUlong4 = VectorSizeUlong4 * sizeof(ulong4);

cudaMalloc( (void**)&UA, VectorSizeBytesUlong4);
UA_h = reinterpret_cast<ulong4*>(A); // not necessary but increases transparency
cudaMemcpy(UA, UA_h, VectorSizeBytesUlong4);

return UA;
}

[一般免责声明:在浏览器中编写,未经测试或编译,使用风险自负]

关于c++ - 如何在CUDA中将数据从unsigned int复制到ulong4,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19760034/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com