gpt4 book ai didi

linux - torch OSError : [Errno 28] No space left on device

转载 作者:行者123 更新时间:2023-12-05 06:52:34 24 4
gpt4 key购买 nike

我正在使用 ubuntu 18 docker 容器。

$cat/etc/lsb-release

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.4 LTS"

当我尝试从 torchvision 训练 resnext101 模型时,出现以下错误。

Downloading: "https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth" to /home/vmuser/.cache/torch/hub/checkpoints/resnext101_32x8d-8ba56ff5.pth
0%| | 0.00/340M [00:00<?, ?B/s]
Traceback (most recent call last):
File "train_attn_best_config.py", line 377, in <module>
tabct = TabCT(cnn = model, fc_dim = fd, attn_filters = af, n_attn_layers = nal).to(gpu)
File "train_attn_best_config.py", line 219, in __init__
self.ct_cnn = cnn_dict[cnn](pretrained = True)
File "/home/vmuser/anaconda3/envs/pulmo/lib/python3.7/site-packages/torchvision/models/resnet.py", line 317, in resnext101_32x8d
pretrained, progress, **kwargs)
File "/home/vmuser/anaconda3/envs/pulmo/lib/python3.7/site-packages/torchvision/models/resnet.py", line 227, in _resnet
progress=progress)
File "/home/vmuser/anaconda3/envs/pulmo/lib/python3.7/site-packages/torch/hub.py", line 481, in load_state_dict_from_url
download_url_to_file(url, cached_file, hash_prefix, progress=progress)
File "/home/vmuser/anaconda3/envs/pulmo/lib/python3.7/site-packages/torch/hub.py", line 404, in download_url_to_file
f.write(buffer)
File "/home/vmuser/anaconda3/envs/pulmo/lib/python3.7/tempfile.py", line 481, in func_wrapper
return func(*args, **kwargs)
OSError: [Errno 28] No space left on device

当我运行 df 时,我明白了,我的一个 tmpfs 只有 65 MB。我尝试运行 export TMPDIR=/var/tmpexport TMPDIR=~/Data/tmp

$df

Filesystem      1K-blocks       Used Available Use% Mounted on
overlay 1797272568 1705953392 0 100% /
tmpfs 65536 0 65536 0% /dev
tmpfs 98346264 0 98346264 0% /sys/fs/cgroup
/dev/sda6 1797272568 1705953392 0 100% /etc/hosts
shm 65536 0 65536 0% /dev/shm
/dev/sdb1 1845816492 1362932848 389098592 78% /home/vmuser/Data
tmpfs 98346264 12 98346252 1% /proc/driver/nvidia
tmpfs 19669256 93256 19576000 1% /run/nvidia-persistenced/socket
udev 98318592 0 98318592 0% /dev/nvidia1
tmpfs 98346264 0 98346264 0% /proc/acpi
tmpfs 98346264 0 98346264 0% /proc/scsi
tmpfs 98346264 0 98346264 0% /sys/firmware

但是错误依旧存在。

最佳答案

这似乎是一个 shm 问题。
尝试使用 ipc=host 标志运行 docker。

有关详细信息,请参阅 this thread .

关于linux - torch OSError : [Errno 28] No space left on device,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65926311/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com