gpt4 book ai didi

c++ - 如何在结构中嵌入 CUDA 纹理对象?

转载 作者:太空狗 更新时间:2023-10-29 20:56:16 26 4
gpt4 key购买 nike

我们已成功使用以下帖子来帮助创建包含基本类型(如 int *)的结构。纹理为只读数组提供了很好的性能提升。我们使用了很多,这使得内核和内核子函数的参数列表又长又复杂。我们希望将纹理嵌入结构中以减少参数长度和复杂性。

Copying a struct containing pointers to CUDA device

这是代表我们使用的代码方法的片段。它编译,但在运行时崩溃。

// Initialize texture description
memset(&textureDescription, 0, sizeof(textureDescription));
textureDescription.readMode = cudaReadModeElementType;

// Create Texture from variable
cudaTextureObject_t texture = 0;
cudaResourceDesc resource;
memset(&resource, 0, sizeof(resource));
resource.resType = cudaResourceTypeLinear;
resource.res.linear.devPtr = intArray;
resource.res.linear.desc.f = cudaChannelFormatKindSigned;
resource.res.linear.desc.x = 32; // bits per channel
resource.res.linear.sizeInBytes = count*sizeof(int);
cudaCreateTextureObject(&texture, resource, &textureDescription, NULL);

// These declarations are in the .h file
typedef struct SampleStructure {
cudaTextureObject_t texture;
} SampleStructure;
SampleStructure *structureHost;
SampleStructure *structureDevice;

// Create host and device structures
structureHost = (SampleStructure *)malloc(sizeof(SampleStructure));
cudaMalloc(&structureDevice, sizeof(SampleStructure));

// Assign the texture object to the host structure
structureHost->texture = texture;

// Copy the host structure to Global Memory
cudaMemcpy(structureDevice, structureHost, sizeof(SampleStructure), cudaMemcpyHostToDevice));

// Pass Texture and Texture-embedded-in-structure to kernel
kenerl<<<1,1>>>(texture, structureDevice);

...
__global__ void
kernel(cudaTextureObject_t texture, SampleStructure *structureDevice) {
value = tex1Dfetch<int>(texture, index); // Runs successfully at runtime
value = tex1Dfetch<int>(structureDevice->texture, index); // Crashes at runtime
}

在内核代码(或子函数)中使用“纹理”变量时,它正确运行。当改为使用“structureDevice->texture”时,它会在运行时崩溃。

有人可以展示一个简单的代码来展示如何成功地将纹理对象嵌入到传递给内核并运行而不会崩溃的结构中吗?或者有人可以指出我们提供的代码中可能存在错误的地方吗?

最佳答案

按值传递结构得到了一个可行的解决方案。这是让它工作的等效代码。感谢@talonmies 的建议。

虽然结构可以简化参数列表,但它会减慢执行速度,因为系统必须对全局内存进行 2 次调用,而不是 1:1 调用来获取结构和 1 次调用来获取纹理。为了提高性能,可以将结构复制到共享内存。使用共享内存中的结构可以提高性能。

// Create the Texture Object
cudaResourceDesc resource;
memset(&resource, 0, sizeof(resource));
resource.resType = cudaResourceTypeLinear;
resource.res.linear.devPtr = intArray;
resource.res.linear.desc.f = cudaChannelFormatKindSigned;
resource.res.linear.desc.x = 32; // bits per channel
resource.res.linear.sizeInBytes = count*sizeof(int);
cudaCreateTextureObject(&texture, resource, &textureDescription, NULL);

// These structure declarations are in the .h file
typedef struct SampleStructure {
cudaTextureObject_t texture;
} SampleStructure;
SampleStructure structureHost;

// Assign the texture object to the host structure
structureHost.texture = texture;

// Pass Texture and Texture-object-embedded-in-structure to kernel
kenerl<<<1,1>>>(texture, structureHost);

...
__global__ void
kernel(cudaTextureObject_t texture, SampleStructure structureDevice) {
__shared__ SampleStructure structureSharedMemory;

// Copy the structure to shared memory for faster access
if (threadIdx.x == 0)
structureSharedMemory = structureDevice;
__threadfence_block();

value = tex1Dfetch<int>(texture, index); // Runs successfully at runtime
value = tex1Dfetch<int>(structureSharedMemory.texture, index); // Runs successfully at runtime
}

关于c++ - 如何在结构中嵌入 CUDA 纹理对象?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33987381/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com