gpt4 book ai didi

python - 使用 Cityscapes 进行段错误训练 Deeplab

转载 作者:行者123 更新时间:2023-12-04 10:59:16 29 4
gpt4 key购买 nike

我目前正在执行 Deeplab 训练 Cityscapes 数据集上的 exception_65 Backbone 的步骤,但不幸的是我遇到了段错误。我无法重现错误。例如。 PASCAL 数据集的训练效果很好。我检查了路径以及 tensorflow 和驱动程序等的几个版本和组合。即使我在没有 GPU 支持的情况下运行 train.py 脚本,我也会遇到相同的段错误。我在另一台 PC 上执行了相同的步骤并且我工作了。谁知道问题出在哪里?

我的设置:

  • Ubuntu 18.04
  • NVIDIA RTX 2080,驱动程序版本为 430.65(使用 .run 文件安装)
  • CUDA 10.0(使用 .run 文件安装)
  • cudnn 7.6.5
  • python 3.6
  • tensorflow 1.15

通过运行:

python3 "${WORK_DIR}"/train.py \
--logtostderr \
--training_number_of_steps=${NUM_ITERATIONS} \
--train_split="train_fine" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size="769,769" \
--train_batch_size=1 \
--fine_tune_batch_norm=False \
--dataset="cityscapes" \
--tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_cityscapes_train/model.ckpt" \
--train_logdir="${TRAIN_LOGDIR}" \
--dataset_dir="${CITYSCAPES_DATASET}"

我得到以下输出:

I1119 16:52:49.856512 139832269989696 learning.py:768] Starting Queues.
Fatal Python error: Segmentation fault

Thread 0x00007f2cd086b700 (most recent call first):
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/threading.py", line 296 in wait
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/queue.py", line 170 in get
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/site-packages/tensorflow_core/python/summary/writer/event_file_writer.py", line 159 in run
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/threading.py", line 926 in _bootstrap_inner
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007f2d3cc7e740 (most recent call first):
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443 in _call_tf_sessionrun
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350 in _run_fn
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365 in _do_call
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359 in _do_run
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180 in _run
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956 in run
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/site-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 490 in train_step
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/site-packages/tensorflow_core/contrib/slim/python/slim/learning.py", line 775 in train
File "/home/kuschnig/tensorflow/models/research/deeplab/train.py", line 466 in main
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/site-packages/absl/app.py", line 250 in _run_main
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/site-packages/absl/app.py", line 299 in run
File "/home/kuschnig/anaconda3/envs/conda-tf/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40 in run
File "/home/kuschnig/tensorflow/models/research/deeplab/train.py", line 472 in <module>
Segmentation fault (core dumped)

gdb 的回溯显示: GDB Output

最佳答案

我遇到了与描述相同的问题。我通过做两件事成功地解决了这个问题:

  1. 确保你的 tfrecord 的名称(对我来说它们被命名为 train-00000-of-00010.tfrecord)与 --train_split="train" 相同>.
  2. data_generator.py 中第 72 行 splits_to_sizes={'train_fine': 2975splits_to_sizes={'train': 2975 更改.

诀窍是在启动训练的 .shdata_generator 中使用相同的名称(对我来说是 train)。 py 和你的 tfrecord 文件夹。

关于python - 使用 Cityscapes 进行段错误训练 Deeplab,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58938886/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com