gpt4 book ai didi

linux - 通过cuda-gdb调用内核失败?

转载 作者:塔克拉玛干 更新时间:2023-11-03 00:46:37 26 4
gpt4 key购买 nike

有没有办法使用 cuda-gdb 调用内核故障?我已尝试单步执行内核代码并设置无效的索引位置、变量的奇数值,但在从错误设置继续后我无法触发“内核执行失败”。

有谁知道通过 cuda-gdb 执行此操作的正确方法吗?我已经通读了 cuda-gdb 文档两次,但如果可能的话,可能会错过一些关于如何实现这一点的线索。如果有人知道任何最受赞赏的工具/技术,谢谢。

我在 CentOS 7 上,我的设备的计算能力是 2.1。请参阅下面的 uname -a 命令的输出。

Linux john 3.10.0-327.10.1.el7.x86_64 #1 SMP Tue Feb 16 17:03:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

最佳答案

Is there a way to invoke kernel failure using cuda-gdb?

是的,这是可能的。这是一个完整的示例:

$ cat t678.cu
#include <stdio.h>
__global__ void kernel(int *data){

int idx = 0; // line 4
idx += data[0];
int tval = data[idx];
data[1] = tval;
}

int main(){

int *d_data;
cudaMalloc(&d_data, 32*sizeof(int));
cudaMemset(d_data, 0, 32*sizeof(int));
kernel<<<1,1>>>(d_data);
cudaDeviceSynchronize();
cudaError_t err = cudaGetLastError();
if (err != cudaSuccess) printf("kernel fail %s\n", cudaGetErrorString(err));
}
$ nvcc -g -G -o t678 t678.cu
$ cuda-gdb ./t678
NVIDIA (R) CUDA Debugger
7.5 release
Portions Copyright (C) 2007-2015 NVIDIA Corporation
GNU gdb (GDB) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/user2/misc/t678...done.
(cuda-gdb) break t678.cu:4
Breakpoint 1 at 0x4026d5: file t678.cu, line 4.
(cuda-gdb) run
Starting program: /home/user2/misc/./t678
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff700a700 (LWP 8693)]
[Switching focus to CUDA kernel 0, grid 2, block (0,0,0), thread (0,0,0), device 0, sm 14, warp 2, lane 0]

Breakpoint 1, kernel<<<(1,1,1),(1,1,1)>>> (data=0x13047a0000) at t678.cu:4
4 int idx = 0; // line 4
(cuda-gdb) step
5 idx += data[0];
(cuda-gdb) print idx
$1 = 0
(cuda-gdb) set idx=1000000
(cuda-gdb) step
6 int tval = data[idx];
(cuda-gdb) print idx
$2 = 1000000
(cuda-gdb) step

CUDA Exception: Device Illegal Address
The exception was triggered in device 0.

Program received signal CUDA_EXCEPTION_10, Device Illegal Address.
kernel<<<(1,1,1),(1,1,1)>>> (data=0x13047a0000) at t678.cu:7
7 data[1] = tval;
(cuda-gdb)

在上面的cuda-gdb输出中可以看到,将idx变量设置为较大的值后,导致执行时出现index-out-of-bounds(非法地址)错误调试器中的以下行:

  int tval = data[idx];

关于linux - 通过cuda-gdb调用内核失败?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36409906/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com