gpt4 book ai didi

debugging - 传递给设备函数的共享内存地址仍然是共享内存吗?

转载 作者:行者123 更新时间:2023-12-02 21:49:13 26 4
gpt4 key购买 nike

假设我有这个 __device__ 函数:

__device__ unsigned char* dev_kernel(unsigned char* array_sh, int params){
return array_sh + params;
}

__global__ 内核中,我以这种方式使用它:

uarray = dev_kernel (uarray, params);

其中uarray是位于共享内存中的数组。

但是当我使用 cuda-gdb 查看 __global__ 内核中 uarray 的地址时,我得到:

(@generic unsigned char * @shared) 0x1000010 "z\377*"

__device__内核中我得到:

(unsigned char * @generic) 0x1000010 <Error reading address 0x1000010: Operation not permitted>

尽管有错误,程序运行正常(可能是cuda-gdb的一些限制)。

所以,我想知道:在 __device__ 内核中,uarray 是否已共享?我将数组从全局内存更改为共享内存,时间几乎相同(使用共享内存,时间要差一些)。

最佳答案

So, i want to know: Within the __device__ kernel, uarray is shared yet?

是的,当您以这种方式将共享内存的指针传递给设备函数时,它仍然指向共享内存中的同一位置。

为了回答下面发布的令我困惑的问题,我选择展示一个简单的示例:

$ cat t249.cu
#include <stdio.h>

#define SSIZE 256

__device__ unsigned char* dev_kernel(unsigned char* array_sh, int params){
return array_sh + params;
}

__global__ void mykernel(){
__shared__ unsigned char myshared[SSIZE];
__shared__ unsigned char *u_array;
for (int i = 0; i< SSIZE; i++)
myshared[i] = (unsigned char) i;
unsigned char *loc = dev_kernel(myshared, 5);
u_array = loc;
printf("val = %d\n", *loc);
printf("val = %d\n", *u_array);
}

int main(){

mykernel<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
$ nvcc -arch=sm_20 -g -G -o t249 t249.cu
$ cuda-gdb ./t249
NVIDIA (R) CUDA Debugger
5.5 release
....
Reading symbols from /home/user2/misc/t249...done.
(cuda-gdb) break mykernel
Breakpoint 1 at 0x4025dc: file t249.cu, line 9.
(cuda-gdb) run
Starting program: /home/user2/misc/t249
[Thread debugging using libthread_db enabled]

Breakpoint 1, mykernel () at t249.cu:9
9 __global__ void mykernel(){
(cuda-gdb) break 14
Breakpoint 2 at 0x4025e1: file t249.cu, line 14.
(cuda-gdb) continue
Continuing.
[New Thread 0x7ffff725a700 (LWP 26184)]
[Context Create of context 0x67e360 on Device 0]
[Launch of CUDA Kernel 0 (mykernel<<<(1,1,1),(1,1,1)>>>) on Device 0]
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 2, warp 0, lane 0]

Breakpoint 1, mykernel<<<(1,1,1),(1,1,1)>>> () at t249.cu:12
12 for (int i = 0; i< SSIZE; i++)
(cuda-gdb) continue
Continuing.

Breakpoint 2, mykernel<<<(1,1,1),(1,1,1)>>> () at t249.cu:14
14 unsigned char *loc = dev_kernel(myshared, 5);
(cuda-gdb) print &(myshared[0])
$1 = (@shared unsigned char *) 0x8 ""
^
|
cuda-gdb is telling you that this pointer is defined in a __shared__ statement, and therefore it's storage is implicit and it is unmodifiable.

(cuda-gdb) print &(u_array)
$2 = (@generic unsigned char * @shared *) 0x0
^ ^
| u_array is stored in shared memory.
u_array is a generic pointer, meaning it can point to anything.

(cuda-gdb) step
dev_kernel(unsigned char * @generic, int) (array_sh=0x1000008 "", params=5)
at t249.cu:6
6 return array_sh + params;
(cuda-gdb) print array_sh
$3 = (@generic unsigned char * @register) 0x1000008 ""
^ ^
| array_sh is stored in a register.
array_sh is a generic pointer, it can point to anything.

(cuda-gdb) print u_array
No symbol "u_array" in current context.
(note that I can't access u_array from inside the __device__ function, so I don't understand your comment there.)

(cuda-gdb) step
mykernel<<<(1,1,1),(1,1,1)>>> () at t249.cu:15
15 u_array = loc;
(cuda-gdb) step
16 printf("val = %d\n", *loc);
(cuda-gdb) print u_array
$4 = (
@generic unsigned char * @shared) 0x100000d ......
^ ^
| u_array is stored in shared memory
u_array is a generic pointer, it can point to anything
(cuda-gdb)

虽然您没有提供它,但根据您获得的 cuda-gdb 输出,我假设您对 u_array 的定义与我的类似。

请注意,像 @shared 这样的指示器不会告诉您指针指向哪种类型的内存,它们会告诉您它是什么类型的指针(在 __shared__ 语句中隐式定义)或存储在其他位置(共享内存中)。

如果这不能解决您的问题,请提供一个完整的示例,以及完整的 cuda-gdb session 输出,就像我一样。

关于debugging - 传递给设备函数的共享内存地址仍然是共享内存吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18987045/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com