gpt4 book ai didi

用于 double 定义错误的 CUDA atomicAdd

转载 作者:行者123 更新时间:2023-12-04 01:58:52 24 4
gpt4 key购买 nike

在以前的 CUDA 版本中,atomicAdd 没有为 doubles 实现,所以实现这个很常见,比如 here .使用新的 CUDA 8 RC,当我尝试编译包含此类函数的代码时遇到了麻烦。我想这是因为使用 Pascal 和 Compute Capability 6.0,添加了 atomicAdd 的 native 双版本,但不知何故,以前的 Compute Capabilities 没有正确忽略它。

下面的代码用于在以前的 CUDA 版本中编译和运行良好,但现在我收到此编译错误:

test.cu(3): error: function "atomicAdd(double *, double)" has already been defined

但是如果我删除我的实现,我反而会收到这个错误:
test.cu(33): error: no instance of overloaded function "atomicAdd" matches the argument list
argument types are: (double *, double)

我应该补充一点,如果我用 -arch=sm_35 编译,我只会看到这个或类似。如果我用 -arch=sm_60 编译我得到了预期的行为,即只有第一个错误,并且在第二种情况下编译成功。

编辑:另外,它特定于 atomicAdd -- 如果我更改名称,则效果很好。

它真的看起来像一个编译器错误。其他人可以确认是这种情况吗?

示例代码:
__device__ double atomicAdd(double* address, double val)
{
unsigned long long int* address_as_ull = (unsigned long long int*)address;
unsigned long long int old = *address_as_ull, assumed;
do {
assumed = old;
old = atomicCAS(address_as_ull, assumed,
__double_as_longlong(val + __longlong_as_double(assumed)));
} while (assumed != old);
return __longlong_as_double(old);
}

__global__ void kernel(double *a)
{
double b=1.3;
atomicAdd(a,b);
}

int main(int argc, char **argv)
{
double *a;
cudaMalloc(&a,sizeof(double));

kernel<<<1,1>>>(a);

cudaFree(a);
return 0;
}

编辑:我从 Nvidia 那里得到了一个答案,他们认识到了这个问题,以下是开发人员对此的看法:

The sm_60 architecture, that is newly supported in CUDA 8.0, has native fp64 atomicAdd function. Because of the limitations of our toolchain and CUDA language, the declaration of this function needs to be present even when the code is not being specifically compiled for sm_60. This causes a problem in your code because you also define a fp64 atomicAdd function.

CUDA builtin functions such as atomicAdd are implementation-defined and can be changed between CUDA releases. Users should not define functions with the same names as any CUDA builtin functions. We would suggest you to rename your atomicAdd function to one that is not the same as any CUDA builtin functions.

最佳答案

atomicAdd 的这种风格是为计算能力 6.0 引入的一种新方法。您可以使用宏定义保留其他计算功能的先前实现来保护它

#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
<... place here your own pre-pascal atomicAdd definition ...>
#endif

这个名为架构识别宏的宏被记录在案 here :

5.7.4. Virtual Architecture Identification Macro

The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy.

This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.



我假设 NVIDIA 没有将它放在以前的 CC 中,以避免与定义它的用户发生冲突,而不是转移到 Compute Capability >= 6.x。不过,我不会认为这是一个 BUG,而是一种发布交付实践。

编辑 :宏保护不完整(已修复)- 这里是一个完整的示例。
#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
__device__ double atomicAdd(double* a, double b) { return b; }
#endif

__device__ double s_global ;
__global__ void kernel () { atomicAdd (&s_global, 1.0) ; }


int main (int argc, char* argv[])
{
kernel<<<1,1>>> () ;
return ::cudaDeviceSynchronize () ;
}

编译:
$> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26

命令行(都成功):
$> nvcc main.cu -arch=sm_60
$> nvcc main.cu -arch=sm_35

您可能会发现为什么它适用于包含文件: sm_60_atomic_functions.h , 如果 __CUDA_ARCH__ 没有声明方法低于600。

关于用于 double 定义错误的 CUDA atomicAdd,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37566987/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com