gpt4 book ai didi

python - 尝试在 tf.estimator 中使用 MirroredStrategy 时出错

转载 作者:太空宇宙 更新时间:2023-11-04 04:45:06 26 4
gpt4 key购买 nike

我正在尝试使用 tf.contrib.distribute.MirroredStrategy 作为 tf.estimator.RunConfig 的参数为我的 tensorflow 训练代码添加多 GPU 支持。

Tensorflow 版本:1.7(从源码编译)

Python 版本:3.5

操作系统平台及版本:Linux Ubuntu 16.04.2

我收到以下错误消息:

Traceback (most recent call last):
File "python3.5/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
yield
File "python3.5/site-packages/tensorflow/contrib/distribute/python/mirrored_strategy.py", line 248, in _call_for_each_tower
self, *merge_args, **merge_kwargs)
File "python3.5/site-packages/tensorflow/python/training/optimizer.py", line 667, in _distributed_apply
reduced_grads = distribution.batch_reduce("sum", grads_and_vars)
File "python3.5/site-packages/tensorflow/python/training/distribute.py", line 801, in batch_reduce
return self._batch_reduce(method_string, value_destination_pairs)
File "python3.5/site-packages/tensorflow/contrib/distribute/python/mirrored_strategy.py", line 295, in _batch_reduce
value_destination_pairs)
File "python3.5/site-packages/tensorflow/contrib/distribute/python/cross_tower_ops.py", line 169, in batch_reduce
raise ValueError("`value_destination_pairs` must be a list or a tuple of "
ValueError: `value_destination_pairs` must be a list or a tuple of tuples of PerDevice objects and destinations

以下代码会产生错误(我省略了将 tfrecord 解析为图像张量的代码,因为我认为这段代码不会影响错误,但如果需要我可以添加它):

import glob, os
import tensorflow as tf
slim = tf.contrib.slim

# ...
# definition of args (arguments parser)

def input_fn():

dataset = tf.data.TFRecordDataset(glob.glob(os.path.join(args.train_data_dir, 'train*')))
dataset = dataset.map(
lambda x: parse_and_preprocess_image(x, args.image_size),
num_parallel_calls=2,
)
dataset = dataset.repeat()
dataset = dataset.batch(batch_size=4)
dataset = dataset.prefetch(1)

return dataset


def model_fn(features, labels=None, mode=tf.estimator.ModeKeys.TRAIN, params=None):

train_images_batch = features
res = slim.conv2d(inputs=train_images_batch, kernel_size=9, stride=1, num_outputs=3, scope='conv1')
loss = tf.reduce_mean((train_images_batch - res) ** 2)
optimizer = tf.train.AdamOptimizer(0.001)
train_op = slim.learning.create_train_op(loss, optimizer)
return tf.estimator.EstimatorSpec(
mode=tf.estimator.ModeKeys.TRAIN,
loss=loss, train_op=train_op)


def train():

init()

distribution = tf.contrib.distribute.MirroredStrategy(num_gpus=args.num_gpus)

config = tf.estimator.RunConfig(
model_dir=args.log_dir,
train_distribute=distribution,
)

estimator = tf.estimator.Estimator(model_fn=model_fn, config=config)
estimator.train(
input_fn=input_fn,
max_steps=args.train_steps,
)


def main():
add_arguments()
train()


if __name__ == '__main__':
main()

谢谢!

进阶

最佳答案

如果您指定了 num_gpus=1,则会发生此错误。对于单个 GPU,您可以使用 OneDeviceStrategy("/device:GPU:0") 而不是 MirroredStrategy

关于python - 尝试在 tf.estimator 中使用 MirroredStrategy 时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49805955/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com