gpt4 book ai didi

linux - TensorFlow 多 GPU InvalidArgumentError : cifar10_multi_gpu. py

转载 作者:塔克拉玛干 更新时间:2023-11-03 01:19:27 25 4
gpt4 key购买 nike

我尝试使用多 GPU 训练我的模型。所以我运行了 cifar10_multi_gpu.py (https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py)。

<强>1。我的位置:


操作系统平台:Linux 版本 3.10.0-327.el7.x86_64

已安装 TensorFlow:pip install --upgrade ./tensorflow_gpu-1.0.0rc0-cp35-cp35m-linux_x86_64.whl

Python版本:Python 3.5.2

CUDA/cuDNN版本:cuda_8.0.61_375.26_linux.run/cudnn-8.0-linux-x64-v5.1.tgz

<强>2。 GPU 设置正确

import tensorflow as tf

with tf.device('/cpu:0'):

     a = tf.constant([1.0, 2.0, 3.0], shape=[3], name='a')

b = tf.constant([1.0, 2.0, 3.0], shape=[3], name='b')

with tf.device('/gpu:1'):

     c = a + b

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

sess.run(c)

add: (Add): /job:localhost/replica:0/task:0/gpu:1 I

tensorflow/core/common_runtime/simple_placer.cc:841] add: (Add)/job:localhost/replica:0/task:0/gpu:1 b: (Const): /job:localhost/replica:0/task:0/cpu:0 I

tensorflow/core/common_runtime/simple_placer.cc:841] b: (Const)/job:localhost/replica:0/task:0/cpu:0 a: (Const): /job:localhost/replica:0/task:0/cpu:0 I

tensorflow/core/common_runtime/simple_placer.cc:841] a: (Const)/job:localhost/replica:0/task:0/cpu:0

array([ 2., 4., 6.], dtype=float32)

3。 InvalidArgumentError: python cifar10_multi_gpu.py

I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /GPU:0 for node 'tower_0/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /CPU:0

Traceback (most recent call last): File "/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call return fn(*args)

File "/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1000, in _run_fn self._extend_graph()

File "/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1049, in _extend_graph self._session, graph_def.SerializeToString(), status)

File "/home/xx/anaconda3/lib/python3.5/contextlib.py", line 66, in exit next(self.gen)

File "/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status))

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device to node 'tower_0/softmax_linear/weight_loss_1': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.

[[Node: tower_0/softmax_linear/weight_loss_1 = ScalarSummary[T=DT_FLOAT, _device="/device:GPU:0"](tower_0/softmax_linear/weight_loss_1/tags, tower_0/softmax_linear/weight_loss)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "cifar10_multi_gpu_train.py", line 280, in tf.app.run() File "/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 44, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough))

File "cifar10_multi_gpu_train.py", line 276, in main train()

File "cifar10_multi_gpu_train.py", line 237, in train sess.run(init)

File "/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run run_metadata_ptr)

File "/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _run feed_dict_string, options, run_metadata)

File "/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run target_list, options, run_metadata)

File "/home/xx/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call raise type(e)(node_def, op, message)

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device to node 'tower_0/softmax_linear/weight_loss_1': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.

[[Node: tower_0/softmax_linear/weight_loss_1 = ScalarSummary[T=DT_FLOAT, _device="/device:GPU:0"](tower_0/softmax_linear/weight_loss_1/tags, tower_0/softmax_linear/weight_loss)]]

我尝试了很多解决方案但都失败了。感谢您提前提出任何建议。

最佳答案

抱歉,您遇到问题了!我咨询了该脚本的一位原作者,这是他的回复:

看起来设备放置效果不佳。

  • 根据作者的测试,他检查了他是否可以访问“cpu:0”和“gpu:1”,但他从未检查过“gpu:0”。我会检查一下。

  • 作者还应在 SessionConfig 中设置 allow_soft_placement=True 以允许宽松的设备放置。

关于linux - TensorFlow 多 GPU InvalidArgumentError : cifar10_multi_gpu. py,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45648322/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com