gpt4 book ai didi

c++ - 丢失在 CUDA 设备指针中

转载 作者:行者123 更新时间:2023-11-28 07:04:12 24 4
gpt4 key购买 nike

作为我论文工作的一部分,我正在 CUDA 项目中工作(修改其他人的代码、添加功能等)。作为 CUDA 的新手,这对我来说是一个真正的挑战。我正在使用计算能力 1.3 卡,4 x Tesla C1060。遗憾的是,我遇到了平台的一些限制。

我需要将几个新结构传递给设备,我相信它们已被正确复制。但是,当尝试在我的内核调用中将指针传递到设备上的结构时,我达到了 256 字节的限制(如本 question 中所述)。

我的代码是这样的:

// main.cu
static void RunGPU(HostThreadState *hstate)
{
SimState *HostMem = &(hstate->host_sim_state);
SimState DeviceMem;

TetrahedronStructGPU *h_root = &(hstate->root);
TetrahedronStructGPU *d_root;
TriangleFacesGPU *h_faces = &(hstate->faces);
TriangleFacesGPU *d_faces;

GPUThreadStates tstates;

unsigned int n_threads = hstate->n_tblks * NUM_THREADS_PER_BLOCK;
unsigned int n_tetras = hstate->n_tetras; // 9600
unsigned int n_faces = hstate->n_faces; // 38400

InitGPUStates(HostMem, h_root, h_faces, &DeviceMem, &tstates, hstate->sim,
d_root, d_faces, n_threads, n_tetras, n_faces );
cudaThreadSynchronize();

...

kernel<<<dimGrid, dimBlock, k_smem_sz>>>(DeviceMem, tstates, /*OK, these 2*/
d_root, d_faces);
// Limit of 256 bytes adding d_root and/or d_faces
cudaThreadSynchronize();

...

}

InitGPUStates 函数在另一个源文件中:

// kernel.cu
int InitGPUStates(SimState* HostMem, TetrahedronStructGPU* h_root,
TriangleFacesGPU* h_faces,
SimState* DeviceMem, GPUThreadStates *tstates,
SimulationStruct* sim,
TetrahedronStructGPU* d_root, TriangleFacesGPU* d_faces,
int n_threads, int n_tetras, int n_faces)
{
unsigned int size;

// Allocate and copy RootTetrahedron (d_root) on device
size = n_tetras * sizeof(TetrahedronStructGPU); // Too big
checkCudaErrors(cudaMalloc((void**)&d_root, size));
checkCudaErrors(cudaMemcpy(d_root, h_root, size, cudaMemcpyHostToDevice));

// Allocate and copy Faces (d_faces) on device
size = n_faces * sizeof(TriangleFacesGPU); // Too big
checkCudaErrors(cudaMalloc((void**)&d_faces, size));
checkCudaErrors(cudaMemcpy(d_faces, h_faces, size, cudaMemcpyHostToDevice));

...
}

我知道我只需要传递指向设备内存位置的指针。如何获取设备中的地址?这种指针传递是否正确完成?

两个新结构是:

// header.h
typedef struct {
int idx;
int vertices[4];
float Nx, Ny, Nz, d;
} TriangleFacesGPU;

typedef struct {
int idx, region;
int vertices[4], faces[4], adjTetras[4];
float n, mua, mus, g;
} TetrahedronStructGPU;

// other structures
typedef struct {
BOOLEAN *is_active;
BOOLEAN *dead;
BOOLEAN *FstBackReflectionFlag;
int *NextTetrahedron;
UINT32 *NumForwardScatters;
UINT32 *NumBackwardScatters;
UINT32 *NumBackwardsSpecularReflections;
UINT32 *NumBiases;
UINT32 *p_layer;
GFLOAT *p_x, *p_y, *p_z;
GFLOAT *p_ux, *p_uy, *p_uz;
GFLOAT *p_w;
GFLOAT *Rspecular;
GFLOAT *LocationFstBias;
GFLOAT *OpticalPath;
GFLOAT *MaxDepth;
GFLOAT *MaxLikelihoodRatioIncrease;
GFLOAT *LikelihoodRatioIncreaseFstBias;
GFLOAT *LikelihoodRatio;
GFLOAT *LikelihoodRatioAfterFstBias;
GFLOAT *s, *sleft;
TetrahedronStructGPU *tetrahedron;
TriangleFacesGPU *faces;
} GPUThreadStates;

typedef struct {
UINT32 *n_p_left;
UINT64 *x;
UINT32 *a;
UINT64 *Rd_ra;
UINT64 *A_rz;
UINT64 *Tt_ra;
} SimState;

kernel的定义是

__global__ void kernel(SimState d_state, GPUThreadStates tstates,
TetrahedronStructGPU *d_root,
TriangleFacesGPU *d_faces);

我将致力于将 SimState d_state 更改为指针传递 SimState *d_state。以及 GPUThreadStates tstatesGPUThreadStates *tstates

最佳答案

看来你还没有初始化 DeviceMem 结构,它应该保存稍后应该用 cudaMalloc 初始化的指针。

你应该这样做:

SimState* DeviceMem;

cudaMalloc(&DeviceMem, sizeof(SimState))

也是(或任何其他为该指针分配内存的方式)。

关于c++ - 丢失在 CUDA 设备指针中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21997385/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com