gpt4 book ai didi

gpu - 如何在 Pytorch 中修复 "RuntimeError: CUDA error: device-side assert triggered"

转载 作者:行者123 更新时间:2023-12-03 23:12:26 27 4
gpt4 key购买 nike

我正在尝试从此存储库中训练 yolo-v3 模型 https://github.com/eriklindernoren/PyTorch-YOLOv3
在我的自定义形状数据集上,但我不断收到错误“运行时错误:CUDA 错误:触发设备端断言”

我试图查找解决方案并尝试了不同答案中建议的几件事(例如修复注释中类的索引),但错误仍然存​​在。

我正在按照 repo 自述文件中的描述来训练自定义数据集,并相应地调整了 custom.data 和 data/custom/。

我一直收到这个输出。

C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [33,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [34,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [35,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [36,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [37,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [38,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [39,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [40,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [41,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [42,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [43,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [44,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [45,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [1,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [3,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [4,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [5,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [6,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [7,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [12,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [13,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [14,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [15,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [20,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [21,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [22,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [23,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [24,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [25,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [26,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [27,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [28,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [29,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [31,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
File "train.py", line 105, in <module>
loss, outputs = model(imgs, targets)
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "D:\Documents\GP\Code\TorchYolo\PyTorch-YOLOv3\models.py", line 259, in forward
x, layer_loss = module[0](x, targets, img_dim)
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "D:\Documents\GP\Code\TorchYolo\PyTorch-YOLOv3\models.py", line 188, in forward
ignore_thres=self.ignore_thres,
File "D:\Documents\GP\Code\TorchYolo\PyTorch-YOLOv3\utils\utils.py", line 318, in build_targets
iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False)
File "D:\Documents\GP\Code\TorchYolo\PyTorch-YOLOv3\utils\utils.py", line 199, in bbox_iou
b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
RuntimeError: CUDA error: device-side assert triggered


与 train.jpg 类标签索引混淆时,唯一改变的是数组索引中的“2”
b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2

最佳答案

我在 google colab 上训练 resnet 模型时也遇到了这个问题。所以在我的例子中,我在 7 个类别上训练模型,但我的网络的最后一层设置为输出 3 个类别。
因此我改变了

self.classifier = torch.nn.Sequential(torch.nn.BatchNorm1d(512), torch.nn.Linear(512, 3))
对此
self.classifier = torch.nn.Sequential(torch.nn.BatchNorm1d(512), torch.nn.Linear(512, 7))
这样做之后,我仍然收到错误消息,因为我没有重新启动 google colab。
请记住,每当您收到此错误时 , 检查两件事:
  • class_labels 应该从 0 开始即在我的情况下 [0,1,2,3,4,5,6] 为 7 个类(class)。
  • 检查最终输出层是否输出准确的类数 .

  • 在那之后,
    刷新笔记本以刷新所有 cuda 断言。
    在出现任何 CUDA 错误后,重新启动笔记本,否则您将继续收到 CUDA 错误,因为之前的断言尚未清除。通过重新启动笔记本,您将清除所有 cuda 断言。

    关于gpu - 如何在 Pytorch 中修复 "RuntimeError: CUDA error: device-side assert triggered",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58242415/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com