debugging - Pytorch 错误 : Could not run 'aten::slow_conv3d_forward' with arguments from the 'CUDATensorId' backend-6ren

debugging - Pytorch 错误 : Could not run 'aten::slow_conv3d_forward' with arguments from the 'CUDATensorId' backend

转载作者：行者123 更新时间：2023-12-03 15:48:37

我正在 CUDA GPU 上训练一个 CNN，它将 3D 医学图像作为输入并输出一个分类器。我怀疑pytorch中可能存在错误。我正在运行 pytorch 1.4.0。 GPU 是“特斯拉 P100-PCIE-16GB”。当我在 CUDA 上运行模型时出现错误

Traceback (most recent call last):
  File "/home/ub/miniconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-55-cc0dd3d9cbb7>", line 1, in <module>
    net(cc)
  File "/home/ub/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "<ipython-input-2-19e11966d1cd>", line 181, in forward
    out = self.layer1(x)
  File "/home/ub/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ub/miniconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/home/ub/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ub/miniconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 480, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Could not run 'aten::slow_conv3d_forward' with arguments from the 'CUDATensorId' backend. 'aten::slow_conv3d_forward' is only available for these backends: [CPUTensorId, VariableTensorId].

要复制问题:

#input is a 64,64,64 3d image batch with 2 channels
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv3d(2, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool3d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv3d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool3d(kernel_size=2, stride=2))
        self.drop_out = nn.Dropout()
        self.fc1 = nn.Linear(16 * 16*16 * 64, 1000)
        self.fc2 = nn.Linear(1000, 2)
        # self.softmax =  nn.LogSoftmax(dim=1)

    def forward(self, x):
        # print(out.shape)
        out = self.layer1(x)
        # print(out.shape)
        out = self.layer2(out)
        # print(out.shape)
        out = out.reshape(out.size(0), -1)
        # print(out.shape)
        out = self.drop_out(out)
        # print(out.shape)
        out = self.fc1(out)
        # print(out.shape)
        out = self.fc2(out)
        # out = self.softmax(out)
        # print(out.shape)
        return out


net = Convnet()
input = torch.randn(16, 2, 64, 64, 64)
net(input)

最佳答案

最初，我认为错误消息表明 'aten::slow_conv3d_forward'未使用 GPU (CUDA) 实现。但是看了你的网络之后，我觉得它没有意义，因为 Conv3D 是一个非常基本的操作，Pytorch 团队应该在 CUDA 中实现它。

然后我深入了一下源码，发现输入不是CUDA张量，导致问题。

这是一个工作示例:

import torch
from torch import nn

#input is a 64,64,64 3d image batch with 2 channels
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv3d(2, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool3d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv3d(32, 64, kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool3d(kernel_size=2, stride=2))
        self.drop_out = nn.Dropout()
        self.fc1 = nn.Linear(16 * 16*16 * 64, 1000)
        self.fc2 = nn.Linear(1000, 2)
        # self.softmax =  nn.LogSoftmax(dim=1)

    def forward(self, x):
        # print(out.shape)
        out = self.layer1(x)
        # print(out.shape)
        out = self.layer2(out)
        # print(out.shape)
        out = out.reshape(out.size(0), -1)
        # print(out.shape)
        out = self.drop_out(out)
        # print(out.shape)
        out = self.fc1(out)
        # print(out.shape)
        out = self.fc2(out)
        # out = self.softmax(out)
        # print(out.shape)
        return out


net = ConvNet()
input = torch.randn(16, 2, 64, 64, 64)
net.cuda()
input = input.cuda() # IMPORTANT to reassign your tensor
net(input)

记得当你把模型从 CPU 放到 GPU 上时，你可以直接调用 .cuda() , 但是如果你把一个张量从 CPU 放到 GPU 上，你就需要重新分配它，比如 tensor = tensor.cuda() , 而不是只调用 tensor.cuda() .希望有帮助。

输出:

tensor([[-0.1588,  0.0680],
        [ 0.1514,  0.2078],
        [-0.2272, -0.2835],
        [-0.1105,  0.0585],
        [-0.2300,  0.2517],
        [-0.2497, -0.1019],
        [ 0.1357, -0.0475],
        [-0.0341, -0.3267],
        [-0.0207, -0.0451],
        [-0.4821, -0.0107],
        [-0.1779,  0.1247],
        [ 0.1281,  0.1830],
        [-0.0595, -0.1259],
        [-0.0545,  0.1838],
        [-0.0033, -0.1353],
        [ 0.0098, -0.0957]], device='cuda:0', grad_fn=<AddmmBackward>)

关于debugging - Pytorch 错误 : Could not run 'aten::slow_conv3d_forward' with arguments from the 'CUDATensorId' backend，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60563115/

文章推荐： python - 如何正确使用Tensorflow MeanIOU指标？

文章推荐： Android Firebase 版本升级

文章推荐： react-native - 是否有禁用选项卡按钮的选项

安卓工作室 : emulator is running but not showing up in Run App "choose a running device"
我已经通过 AVD 管理器启动了我的模拟器，一旦它运行，我点击了 run app。我已经等了几分钟，我的正在运行的设备出现在选择一个正在运行的设备中，但窗口始终保持空白。最佳答案您正在运行的项
inno-setup - 创新设置: How to run a code procedure in Run section or before Run section?
我想在安装新数据库之前删除旧数据库，以便为用户更新它。我有以下情况: 在我的 Components 部分中，我为用户提供了一个选项: [Components] Name: "updateDataba
python - 如果模块 'example' 包含函数 'run' 和子模块 'run' ，我可以指望 'from example import run' 总是导入前者吗？
如果我将一个 Python 模块实现为一个目录(即包)，它同时具有顶级函数 run 和子模块 run，我可以指望 from example import run 总是导入函数？根据我的测试，至少在 L
Eclipse每次运行项目都会修改server.xml(Run-->Run on Server)
我在 Eclipse Juno 上使用 Tomcat 7。我使用工作区元数据作为服务器位置(请参阅下面的我的 tomcat 配置)。我也收到了服务器项目在 eclipse [请看下图] 中使用单独
java - run 方法内部的线程状态冲突；为什么线程状态不是 "RUNNING"
我正在做一些测试以了解 java 中的不同线程状态，并且遇到了一些查询。通常，当一个线程被实例化时，它被称为处于 "NEW" 状态，然后当调用它的 start() 方法时，操作系统调度程序获得控制权
jquery - 将应用程序迁移到 Angular 6 : But getting errors while running npm run build --prod. 但命令 npm run build --env=prod 成功运行
当我使用命令 npm run build -- --prod 时，我收到以下错误消息: 属性“PropertyName1”是私有(private)属性，只能在“AppComponent”类中访问 “A
java - 英特尔lij : What's the difference between 'Run' and 'Run...'
我正在尝试将默认的“运行”键盘快捷键更改为 ⌘R。 - 因为我不想每次都伸手去拿触控板，而且我的手指不够长，无法一次执行⌥⇧F10。 “运行”和“运行...”有什么区别？最佳答案 ... 用于菜单中
java - 智能 : Does multiple runs are running independently
我现在不知道如何编写一个合适的方法来测试这种行为。请不要投反对票.. 我现在有一个 java 类负责处理数据并将数据添加到多个数据库。每个数据库都保存相同的数据，但处理方式不同(例如，以不同的插值率进
java - 是否可以通过在 run() 方法中调用 run() 来启动线程？
我知道不应该调用 run 方法来启动新线程执行，但我指的是 this article他们在另一个 run 方法中调用了 runnable.run(); ，这似乎暗示它启动了一个新线程或者根本没有cre
How can I fix the Eclipse error "Unable to execute MI command: -exec-run" (path error) that occurs debugging a CygWin64 app?(如何修复调试CygWin64应用程序时出现的“Unable to Execute MI Command：-exec-Run”(无法执行MI命令：-exec-run)(路径错误)？)
当我尝试在Windows 10/11下使用Eclipse 2023-06调试任何应用程序(甚至是hello.c)时，我总是收到以下错误：。该错误清楚地指示-(错误2)-路径是错误的。。我试图在互联网上
Vue中npm run dev 和 npm run serve区别
在运行vue文件时，需要进行npm操作，但我们发现，有时候用的是npm run serve，而有的时候用的是npm run dev，二者有什么区别在我们运行一些 vue 项目的时候，输入npm ru
google-cloud-run - 即使我的脚本仍在运行，cloud run 也会关闭容器
我想在 cloud run 上运行一个长时间运行的作业。该任务可能执行超过 30 分钟，并且主要发送 API 请求。cloud run 在大约 20 分钟后停止执行，从指标来看，它似乎没有识别出我的任
google-cloud-run - Cloud Run 是否支持服务器发送事件 (SSE)？
我们无法让 SSE 从 Google Cloud Run 上的容器发送。我已经尝试使用一个简单的 SSE 示例( https://github.com/kljensen/node-sse-exampl
haskell - 是否有类似于 `stack run` 的 `cabal run` ？
直到最近，我一直在执行这个美丽来构建 + 运行一个带有堆栈的项目: stack build && .stack-work/install/x86_64-linux/lts-4.1/7.10.3/bin
google-cloud-run - Google Cloud Run 与本地机器相比非常慢
我们有一个小脚本，可以抓取网页(约 17 个条目)，并将它们写入 Firestore 集合。为此，我们在 Google Cloud Run 上部署了一项服务。这段代码的执行需要大约 5 秒 when
docker - Docker:docker run -it容器和docker run -it容器bash有什么区别
我是Docker的新手，我知道一种运行交互式容器的方法如下: $ docker run -it image-name bash 要么 $ docker run -it image-name /bin/
docker - Dockerfile 中的多个 RUN 条目和只有一个 RUN 条目有什么区别？
Dockerfile 中的多个 RUN 条目之间有什么区别，例如: FROM php:5.6-apache RUN docker-php-ext-install mysqli RUN apt upda
google-cloud-run - Google Cloud Run 内存限制是否适用于容器大小？
对于来自文档的云运行内存使用情况 ( https://cloud.google.com/run/docs/configuring/memory-limits ) Cloud Run applicati
Eclipse: "Run as"不显示列表元素(如 "Run as android application")
今天早上我更新了我的 Ubuntu 版本，现在我无法从 eclipse 运行我的应用程序。问题是，当我单击“运行方式”时出现的列表是空的，我无法运行任何内容。我该如何解决这个问题？我能看到的唯一
spring - mvn Spring 启动 :run vs Run
我正在 intelliJ 上使用 livereload 测试 spring-boot-devtools。我有一个简单的 SpringBootApplication，可以正常工作。当我从 maven

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

debugging - Pytorch 错误 : Could not run 'aten::slow_conv3d_forward' with arguments from the 'CUDATensorId' backend