gpt4 book ai didi

python - 使用 Tensorflow 2.0 使用多个 GPU 进行训练时出现错误 : Out of range: End of sequence

转载 作者:行者123 更新时间:2023-12-01 17:30:45 25 4
gpt4 key购买 nike

我正在使用带有多个 GPU 的 tensorflow2.0 进行训练。它出现以下错误。但如果我只使用一个 GPU,它运行时不会出现任何错误。我的tensorflow版本是tensorflow-gpu-2.0.0:

tensorflow.python.framework.errors_impl.CancelledError: 4 root error(s) found.
(0) Cancelled: Operation was cancelled
[[{{node cond_6/else/_59/IteratorGetNext}}]]
(1) Out of range: End of sequence
[[{{node cond_4/else/_37/IteratorGetNext}}]]
(2) Out of range: End of sequence
[[{{node cond_7/else/_70/IteratorGetNext}}]]
[[metrics/accuracy/div_no_nan/ReadVariableOp_6/_154]]
(3) Out of range: End of sequence
[[{{node cond_7/else/_70/IteratorGetNext}}]]
0 successful operations.
1 derived errors ignored. [Op:__inference_distributed_function_83325]
Function call stack:
distributed_function -> distributed_function -> distributed_function -> distributed_function

这是我的代码,您可以尝试使用环境变量:CUDA_VISIBLE_DEVICES=0CUDA_VISIBLE_DEVICES=0,1。这会得到不同的结果:

import tensorflow as tf
import tensorflow_datasets as tfds

data_name = 'uc_merced'
dataset = tfds.load(data_name)
train_data, test_data = dataset['train'], dataset['train']

def parse(img_dict):
img = tf.image.resize_with_pad(img_dict['image'], 256, 256)
label = img_dict['label']
return img, label

train_data = train_data.map(parse)
train_data = train_data.batch(96)

test_data = test_data.map(parse)
test_data = test_data.batch(96)

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = tf.keras.applications.ResNet50(weights=None, classes=21, input_shape=(256, 256, 3))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])


model.fit(train_data, epochs=50, verbose=2, validation_data=test_data)
model.save('model/resnet_{}.h5'.format(data_name))

最佳答案

您可以尝试以下操作,而不是使用 CUDA_VISIBLE_DEVICES 选择 GPU:

strategy = tf.distribute.MirroredStrategy()
with strategy.scope(devices=["/gpu:0", "/gpu:1"]):

关于python - 使用 Tensorflow 2.0 使用多个 GPU 进行训练时出现错误 : Out of range: End of sequence,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58869351/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com