gpt4 book ai didi

c++ - CUDA、Qt creator 和 Mac

转载 作者:行者123 更新时间:2023-11-30 05:35:35 27 4
gpt4 key购买 nike

我很难将 CUDA 整合到 Qt creator 中。

我确定问题出在我的 .pro 文件中没有正确的信息。我已经发布了我当前的 .pro 文件、我的 .cu 文件 (DT_GPU.cu) 以及下面的错误。

我已经尝试了很多从 linux 和 windows 中获取的 .pro 文件的组合,但没有一个是有效的。此外,我从未见过 Mac/CUDA .pro 文件,因此对于希望让这三者一起工作的 future 人们来说,这可能是一个有用的资源。

在此先感谢您的帮助。

.pro 文件:

CUDA_SOURCES += ../../Source/DT_GPU/DT_GPU.cu

CUDA_DIR = "/Developer/NVIDIA/CUDA-7.5"


SYSTEM_TYPE = 64 # '32' or '64', depending on your system
CUDA_ARCH = sm_21 # Type of CUDA architecture, for example 'compute_10', 'compute_11', 'sm_10'
NVCC_OPTIONS = --use_fast_math


# include paths
INCLUDEPATH += $$CUDA_DIR/include

# library directories
QMAKE_LIBDIR += $$CUDA_DIR/lib/

CUDA_OBJECTS_DIR = ./


# Add the necessary libraries
CUDA_LIBS = -lcublas_device \
-lcublas_static \
-lcudadevrt \
-lcudart_static \
-lcufft_static \
-lcufftw_static \
-lculibos \
-lcurand_static \
-lcusolver_static \
-lcusparse_static \
-lnppc_static \
-lnppi_static \
-lnpps_static

# The following makes sure all path names (which often include spaces) are put between quotation marks
CUDA_INC = $$join(INCLUDEPATH,'" -I"','-I"','"')
LIBS += $$join(CUDA_LIBS,'.so ', '', '.so')
#LIBS += $$CUDA_LIBS

# Configuration of the Cuda compiler
CONFIG(debug, debug|release) {
# Debug mode
cuda_d.input = CUDA_SOURCES
cuda_d.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
cuda_d.commands = $$CUDA_DIR/bin/nvcc -D_DEBUG $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda_d.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda_d
}
else {
# Release mode
cuda.input = CUDA_SOURCES
cuda.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda
}

DT_GPU.cu

#include <cuda.h>
#include <cuda_runtime.h>
#include <device_launch_parameters.h>

__global__ void zero_GPU(double *l_p_array_gpu)
{
int i = threadIdx.x;
printf(" %i: Hello World!\n", i);
l_p_array_gpu[i] = 0.;
}

void zero(double *l_p_array, int a_numElements)
{
double *l_p_array_gpu;

int size = a_numElements * int(sizeof(double));

cudaMalloc((void**) &l_p_array_gpu, size);

cudaMemcpy(l_p_array_gpu, l_p_array, size, cudaMemcpyHostToDevice);

zero_GPU<<<size,1>>>(l_p_array_gpu);

cudaMemcpy(l_p_array, l_p_array_gpu, size, cudaMemcpyDeviceToHost);

cudaFree(l_p_array_gpu);
}

警告:

Makefile:848: warning: overriding commands for target `DT_GPU_cuda.o'
Makefile:792: warning: ignoring old commands for target `DT_GPU_cuda.o'
Makefile:848: warning: overriding commands for target `DT_GPU_cuda.o'
Makefile:792: warning: ignoring old commands for target `DT_GPU_cuda.o'

错误:

In file included from ../SimplexSphereSource.cpp:8:
../../../Source/DT_GPU/DT_GPU.cu:75:19: error: expected expression
zero_GPU<<<size,1>>>(l_p_array_gpu);
^
../../../Source/DT_GPU/DT_GPU.cu:75:28: error: expected expression
zero_GPU<<<size,1>>>(l_p_array_gpu);
^
2 errors generated.
make: *** [SimplexSphereSource.o] Error 1
16:47:18: The process "/usr/bin/make" exited with code 2.
Error while building/deploying project SimplexSphereSource (kit: Desktop Qt 5.4.0 clang 64bit)
When executing step "Make"

最佳答案

我设法让您的示例运行,并对您的 .pro 进行了一些小的更正。文件。如果您或其他任何人仍然对更大的适用于 Mac 和 Linux 的 C++/CUDA/Qt 示例感兴趣,请查看 this answer从几个月前。您的特定情况(或至少您提供的情况)不需要所有额外的 Qt 框架和 GUI 设置,因此 .pro文件非常简单。

如果您还没有这样做,您应该确保您拥有最新的 CUDA Mac 驱动程序并检查一些基本的 CUDA 示例是否编译和运行。我目前正在使用:

  • 操作系统版本 10.10.5
  • Qt 5.5.0
  • NVCC v7.5.17

我在 DP_GPU.cu 中添加了一个主要方法您提供的文件并使用您的 .pro 成功运行了该程序文件有一些变化:

#CUDA_SOURCES += ../../Source/DT_GPU/DT_GPU.cu
CUDA_SOURCES += DT_GPU.cu # <-- same dir for this small example

CUDA_DIR = "/Developer/NVIDIA/CUDA-7.5"


SYSTEM_TYPE = 64 # '32' or '64', depending on your system
CUDA_ARCH = sm_21 # (tested with sm_30 on my comp) Type of CUDA architecture, for example 'compute_10', 'compute_11', 'sm_10'
NVCC_OPTIONS = --use_fast_math


# include paths
INCLUDEPATH += $$CUDA_DIR/include

# library directories
QMAKE_LIBDIR += $$CUDA_DIR/lib/

CUDA_OBJECTS_DIR = ./


# Add the necessary libraries
CUDA_LIBS = -lcudart # <-- changed this

# The following makes sure all path names (which often include spaces) are put between quotation marks
CUDA_INC = $$join(INCLUDEPATH,'" -I"','-I"','"')
#LIBS += $$join(CUDA_LIBS,'.so ', '', '.so') <-- didn't need this
LIBS += $$CUDA_LIBS # <-- needed this


# SPECIFY THE R PATH FOR NVCC (this caused me a lot of trouble before)
QMAKE_LFLAGS += -Wl,-rpath,$$CUDA_DIR/lib # <-- added this
NVCCFLAGS = -Xlinker -rpath,$$CUDA_DIR/lib # <-- and this

# Configuration of the Cuda compiler
CONFIG(debug, debug|release) {
# Debug mode
cuda_d.input = CUDA_SOURCES
cuda_d.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
cuda_d.commands = $$CUDA_DIR/bin/nvcc -D_DEBUG $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda_d.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda_d
}
else {
# Release mode
cuda.input = CUDA_SOURCES
cuda.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
cuda.commands = $$CUDA_DIR/bin/nvcc $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda.dependency_type = TYPE_C
QMAKE_EXTRA_COMPILERS += cuda
}

还有 DP_GPU.cu具有主要功能和一些小改动的文件:

#include <cuda.h>
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#include <stdio.h> // <-- added for 'printf'


__global__ void zero_GPU(double *l_p_array_gpu)
{
int i = blockIdx.x * blockDim.x + threadIdx.x; // <-- in case you use more blocks
printf(" %i: Hello World!\n", i);
l_p_array_gpu[i] = 0.;
}


void zero(double *l_p_array, int a_numElements)
{
double *l_p_array_gpu;

int size = a_numElements * int(sizeof(double));

cudaMalloc((void**) &l_p_array_gpu, size);

cudaMemcpy(l_p_array_gpu, l_p_array, size, cudaMemcpyHostToDevice);

// use one block with a_numElements threads
zero_GPU<<<1, a_numElements>>>(l_p_array_gpu);

cudaMemcpy(l_p_array, l_p_array_gpu, size, cudaMemcpyDeviceToHost);

cudaFree(l_p_array_gpu);
}

// added a main function to run the program
int main(void)
{
// host variables
const int a_numElements = 5;
double l_p_array[a_numElements];

// run cuda function
zero(l_p_array, a_numElements);

// Print l_p_array
printf("l_p_array: { ");
for (int i = 0; i < a_numElements; ++i)
{
printf("%.2f ", l_p_array[i]);
}
printf("}\n");

return 0;
}

输出:

  0: Hello World!
1: Hello World!
2: Hello World!
3: Hello World!
4: Hello World!
l_p_array: { 0.00 0.00 0.00 0.00 0.00 }

一旦你完成这项工作,请确保在深入了解之前花一些时间检查基本的 CUDA 语法和示例。否则调试将是一个真正的麻烦。因为我在这里,虽然我想我也会让你知道 CUDA 内核语法是
kernel_function<<<block_size, thread_size>>>(args) .
你的内核调用 zero_GPU<<<size,1>>>(l_p_array_gpu)当你真正想要相反的时候,实际上会用一个线程创建一堆 block 。

以下函数来自 CUDA 示例,可帮助确定给定数量的元素需要多少线程和 block :

typedef unsigned int uint;

inline uint iDivUp(uint a, uint b)
{
return (a % b != 0) ? (a / b + 1) : (a / b);
}

// compute grid and thread block size for a given number of elements
inline void computeGridSize(uint n, uint blockSize, uint &numBlocks, uint &numThreads)
{
numThreads = min(blockSize, n);
numBlocks = iDivUp(n, numThreads);
}

您可以将它们添加到您的 .cu 的顶部文件或辅助头文件,并使用它们正确调用内核函数。如果你想在你的 DP_GPU.cu 中使用它们您只需添加的文件:

// desired thread count (may change if there aren't enough elements)
dim3 threads(64);
// default block count (will also change based on number of elements)
dim3 blocks(1);
computeGridSize(a_numElements, threads.x, blocks.x, threads.x);

// run kernel
zero_GPU<<<blocks, threads>>>(l_p_array_gpu);

无论如何,有点偏离了方向,但我希望这对您有所帮助!干杯!

关于c++ - CUDA、Qt creator 和 Mac,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33875285/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com