gpt4 book ai didi

c - 在 CUDA C 中使用简单的设备功能获取 "multiple definition"错误

转载 作者:太空宇宙 更新时间:2023-11-04 06:25:32 26 4
gpt4 key购买 nike

我有一个由 2 个 CUDA 文件组成的简单脚本:main.cukernel.cu。他们的目标是计算 2 个 vector 的总和。

// main.cu
#include <stdio.h>
#include <stdlib.h>
#include <cuda_runtime.h>
#include <cuda.h>

#include "kernel.cu"

int main(){
/* Error code to check return values for CUDA calls */
cudaError_t err = cudaSuccess;

srand(time(NULL));
int count = 100;
int A[count], B[count];
int *h_A, *h_B;
h_A = A; h_B = B;

int i;
for(i=0;i<count;i++){
*(h_A+i) = rand() % count; /* Oppure: h_A[i] = rand() % count; */
*(h_B+i) = rand() % count; /* Oppure: h_B[i] = rand() % count; */
}
/* Display dei vettori A e B. */
printf("\nPrimi cinque valori di A = ");
for(i=0;i<4;i++){printf("%d ", A[i]);}
printf("\nPrimi cinque valori di B = ");
for(i=0;i<4;i++){printf("%d ", B[i]);}


int *d_A, *d_B;

err = cudaMalloc((void**)&d_A, count*sizeof(int));
if (err != cudaSuccess){fprintf(stderr, "Failed to allocate device vector A (error code %s)! \n", cudaGetErrorString(err));exit(EXIT_FAILURE);}
err = cudaMalloc((void**)&d_B, count*sizeof(int));
if (err != cudaSuccess){fprintf(stderr, "Failed to allocate device vector A (error code %s)! \n", cudaGetErrorString(err));exit(EXIT_FAILURE);}

err = cudaMemcpy(d_A, A, count*sizeof(int), cudaMemcpyHostToDevice);
if (err != cudaSuccess){fprintf(stderr, "Failed to copy vector A from host to device (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}
err = cudaMemcpy(d_B, B, count*sizeof(int), cudaMemcpyHostToDevice);
if (err != cudaSuccess){fprintf(stderr, "Failed to copy vector A from host to device (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}


int numThreads = 256;
int numBlocks = count/numThreads + 1;
AddInts<<<numBlocks,numThreads>>>(d_A,d_B); err = cudaGetLastError();

err = cudaMemcpy(A, d_A, count*sizeof(int), cudaMemcpyDeviceToHost);
if (err != cudaSuccess){fprintf(stderr, "Failed to copy vector C from device to host (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}

err = cudaFree(d_A);
if (err != cudaSuccess){fprintf(stderr, "Failed to free device vector A (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}
err = cudaFree(d_B);
if (err != cudaSuccess){fprintf(stderr, "Failed to free device vector A (error code %s)!\n", cudaGetErrorString(err));exit(EXIT_FAILURE);}

printf("\nPrimi cinque valori di A = ");
for(i=0;i<4;i++){printf("%d ", A[i]);}

printf("\n");
return 0;}

这是 kernel.cu 文件:

// kernel.cu
__device__ int get_global_index(){
return (blockIdx.x * blockDim.x) + threadIdx.x;
}

__global__ void AddInts(int *a, int *b){
int ID = get_global_index();
*(a+ID) += *(b+ID);

}

我 100% 确定 main.cu 脚本是正确的;我也知道我可以直接在主脚本中添加内核,但这不是我测试的目的;我也知道我可以去掉 __device__ 函数并将它直接放在 __global__ 中,但这也不是我的意图。

当我通过在终端中键入 nvcc main.cu kernel.cu 编译测试时,我收到以下错误消息:

/tmp/tmpxft_0000248b_00000000-30_kernel.o: In function `get_global_index()':
tmpxft_0000248b_00000000-8_kernel.cudafe1.cpp:(.text+0x15): multiple definition of ` get_global_index()'
/tmp/tmpxft_0000248b_00000000-21_main.o:tmpxft_0000248b_00000000-3_main.cudafe1.cpp:(.text+0x15): first defined here
/tmp/tmpxft_0000248b_00000000-30_kernel.o: In function `__device_stub__Z7AddIntsPiS_(int*, int*)':
tmpxft_0000248b_00000000-8_kernel.cudafe1.cpp:(.text+0x7c): multiple definition of `__device_stub__Z7AddIntsPiS_(int*, int*)'
/tmp/tmpxft_0000248b_00000000-21_main.o:tmpxft_0000248b_00000000-3_main.cudafe1.cpp:(.text+0x68e): first defined here
/tmp/tmpxft_0000248b_00000000-30_kernel.o: In function `AddInts(int*, int*)':
tmpxft_0000248b_00000000-8_kernel.cudafe1.cpp:(.text+0xe5): multiple definition of `AddInts(int*, int*)'
/tmp/tmpxft_0000248b_00000000-21_main.o:tmpxft_0000248b_00000000-3_main.cudafe1.cpp:(.text+0x6f7): first defined here
collect2: error: ld returned 1 exit status

我相信错误是由名为 get_global_index() 的设备函数的定义引起的,但我不明白它有什么问题;有谁知道这是怎么回事?

最佳答案

两种选择:

  1. 只需编译 main.cu (nvcc main.cu) 它就会获取 kernel.cu,因为您已经包含了它。

  2. 不要在 main.cu 中包含 kernel.cu

    当您将 kernel.cu 包含在 main.cu 中(将这两个文件传递给编译器)时,它会导致编译器编译code (kernel.cu) 两次,一次是在编译main.cu的时候,一次是在编译kernel.cu的时候。如果选择此选项,则需要在 main.cu 中为 AddInts 内核提供原型(prototype)(前向引用),也许只需包含一个头文件即可那个原型(prototype)。在更一般的情况下,如果您将内容分散到更多文件中,您可能需要将 -rdc=true 添加到您的编译命令行,如果您的文件带有 __global__ 例如,在其他文件中引用 __device__ 函数的函数。

关于c - 在 CUDA C 中使用简单的设备功能获取 "multiple definition"错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27446690/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com