gpt4 book ai didi

CUDA 函数指针

转载 作者:太空狗 更新时间:2023-10-29 15:52:45 25 4
gpt4 key购买 nike

我试图在 CUDA 中做这样的事情(实际上我需要写一些集成函数)

#include <iostream>
using namespace std;

float f1(float x) {
return x * x;
}

float f2(float x) {
return x;
}

void tabulate(float p_f(float)) {
for (int i = 0; i != 10; ++i) {
std::cout << p_f(i) << ' ';
}
std::cout << std::endl;
}

int main() {
tabulate(f1);
tabulate(f2);
return 0;
}

输出:

0 1 4 9 16 25 36 49 64 81
0 1 2 3 4 5 6 7 8 9


我尝试了以下但只得到错误

Error: Function pointers and function template parameters are not supported in sm_1x.

float f1(float x) {
return x;
}

__global__ void tabulate(float lower, float upper, float p_function(float), float* result) {
for (lower; lower < upper; lower++) {
*result = *result + p_function(lower);
}
}

int main() {
float res;
float* dev_res;

cudaMalloc( (void**)&dev_res, sizeof(float) ) ;

tabulate<<<1,1>>>(0.0, 5.0, f1, dev_res);
cudaMemcpy(&res, dev_res, sizeof(float), cudaMemcpyDeviceToHost);

printf("%f\n", res);
/************************************************************************/
scanf("%s");

return 0;
}

最佳答案

要消除编译错误,您必须在编译代码时使用 -gencode arch=compute_20,code=sm_20 作为编译器参数。但是你可能会遇到一些运行时问题:

摘自 CUDA 编程指南 http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions

Function pointers to __global__ functions are supported in host code, but not in device code. Function pointers to __device__ functions are only supported in device code compiled for devices of compute capability 2.x and higher.

It is not allowed to take the address of a __device__ function in host code.

所以你可以有这样的东西(改编自“FunctionPointers”样本):

//your function pointer type - returns unsigned char, takes parameters of type unsigned char and float
typedef unsigned char(*pointFunction_t)(unsigned char, float);

//some device function to be pointed to
__device__ unsigned char
Threshold(unsigned char in, float thresh)
{
...
}

//pComputeThreshold is a device-side function pointer to your __device__ function
__device__ pointFunction_t pComputeThreshold = Threshold;
//the host-side function pointer to your __device__ function
pointFunction_t h_pointFunction;

//in host code: copy the function pointers to their host equivalent
cudaMemcpyFromSymbol(&h_pointFunction, pComputeThreshold, sizeof(pointFunction_t))

然后您可以将 h_pointFunction 作为参数传递给您的内核,内核可以使用它来调用您的 __device__ 函数。

//your kernel taking your __device__ function pointer as a parameter
__global__ void kernel(pointFunction_t pPointOperation)
{
unsigned char tmp;
...
tmp = (*pPointOperation)(tmp, 150.0)
...
}

//invoke the kernel in host code, passing in your host-side __device__ function pointer
kernel<<<...>>>(h_pointFunction);

希望这是有道理的。总之,看起来您必须将 f1 函数更改为 __device__ 函数并遵循类似的过程(typedef 不是必需的,但它们确实使代码更好)才能获得它作为主机端的有效函数指针传递给您的内核。我还建议您看一下 FunctionPointers CUDA 示例

关于CUDA 函数指针,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15644261/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com