gpt4 book ai didi

python - Tensorflow - 在多个 GPU 上复制模型并共享变量?

转载 作者:行者123 更新时间:2023-11-28 19:07:51 27 4
gpt4 key购买 nike

我正在尝试在多个 GPU 上运行一个简单的前馈网络(以异步更新共享权重)。

但是,我无法共享权重。

根据我所做的研究,我只需要在 variable_scope 上设置 reuse=True 但这似乎不起作用:

for i_, gpu_id in enumerate(gpus):

with tf.device(gpu_id):
# [Build graph in here.]

with variable_scope.variable_scope(variable_scope.get_variable_scope(), reuse=i_>0):

x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])

# More code..., see pastebin link below


# Start an interactive tensorflow session
sess = tf.Session()

# Initialize all variables associated with this session
sess.run(tf.initialize_all_variables())

以上是代码示例,在完整代码 ( https://pastebin.com/i4NBnHHC ) 中,我展示了如何在一个 GPU 上进行训练不会更新其他 GPU 上的权重。

最佳答案

最简单的解决方案是使用 in-graph replication :

In-graph replication. In this approach, the client builds a single tf.Graph that contains one set of parameters (in tf.Variable nodes pinned to /job:ps); and multiple copies of the compute-intensive part of the model, each pinned to a different task in /job:worker.

为此,您只需将参数(占位符和变量)放在 CPU 设备上:

# in-graph replication
import tensorflow as tf

num_gpus = 2

# place the initial data on the cpu
with tf.device('/cpu:0'):
input_data = tf.Variable([[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.],
[10., 11., 12.]])
b = tf.Variable([[1.], [1.], [2.]])

# split the data into chunks for each gpu
inputs = tf.split(input_data, num_gpus)
outputs = []

# loop over available gpus and pass input data
for i in range(num_gpus):
with tf.device('/gpu:'+str(i)):
outputs.append(tf.matmul(inputs[i], b))

# merge the results of the devices
with tf.device('/cpu:0'):
output = tf.concat(outputs, axis=0)

# create a session and run
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print sess.run(output)

称为图间复制的更复杂的方法是more preferred在 tensorflow 社区中,但需要使用 tf.train.ClusterSpec 进行更复杂的配置.您可以在他们的 tutorial on distributed tensorflow 中查看示例.

推荐this post比较不同的分布设置。

关于python - Tensorflow - 在多个 GPU 上复制模型并共享变量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44854258/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com