gpt4 book ai didi

machine-learning - 几个批处理的 TensorFlow 平均梯度

转载 作者:行者123 更新时间:2023-11-30 08:25:27 24 4
gpt4 key购买 nike

这可能是 Tensorflow: How to get gradients per instance in a batch? 的重复项。无论如何我都会问这个问题,因为还没有一个令人满意的答案,而且这里的目标有点不同。

我有一个非常大的网络,可以安装在我的 GPU 上,但我可以提供的最大批量大小是 32。任何大于该大小的网络都会导致 GPU 内存不足。我想使用更大的批处理以获得更准确的梯度近似值。

具体来说,假设我想通过依次输入 3 批 32 的数据来计算一大批 96 的梯度。据我所知,最好的方法是使用 Optimizer.compute_gradients() 和 Optimizer.apply_gradients()。这是一个小例子,它是如何工作的

import tensorflow as tf
import numpy as np

learn_rate = 0.1

W_init = np.array([ [1,2,3], [4,5,6], [7,8,9] ], dtype=np.float32)
x_init = np.array([ [11,12,13], [14,15,16], [17,18,19] ], dtype=np.float32)

X = tf.placeholder(dtype=np.float32, name="x")
W = tf.Variable(W_init, dtype=np.float32, name="w")
y = tf.matmul(X, W, name="y")
loss = tf.reduce_mean(y, name="loss")

opt = tf.train.GradientDescentOptimizer(learn_rate)
grad_vars_op = opt.compute_gradients(loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

# Compute the gradients for each batch
grads_vars1 = sess.run(grad_vars_op, feed_dict = {X: x_init[None,0]})
grads_vars2 = sess.run(grad_vars_op, feed_dict = {X: x_init[None,1]})
grads_vars3 = sess.run(grad_vars_op, feed_dict = {X: x_init[None,2]})

# Separate the gradients from the variables
grads1 = [ grad for grad, var in grads_vars1 ]
grads2 = [ grad for grad, var in grads_vars2 ]
grads3 = [ grad for grad, var in grads_vars3 ]
varl = [ var for grad, var in grads_vars1 ]

# Average the gradients
grads = [ (g1 + g2 + g3)/3 for g1, g2, g3 in zip(grads1, grads2, grads3)]

sess.run(opt.apply_gradients(zip(grads,varl)))

print("Weights after 1 gradient")
print(sess.run(W))

现在这一切都非常难看且效率低下,因为前向传递正在 GPU 上运行,而平均梯度发生在 CPU 上,然后再次应用它们发生在 GPU 上。

此外,此代码会引发异常,因为 grads 是一个 np.array 列表,要使其工作,必须创建一个 tf每个渐变的.placeholder

我确信应该有更好、更有效的方法来做到这一点?有什么建议吗?

最佳答案

您可以创建trainable_variables的副本并累积批量梯度。以下是一些需要遵循的简单步骤

...
opt = tf.train.GradientDescentOptimizer(learn_rate)

# constant to scale sum of gradient
const = tf.constant(1/n_batches)
# get all trainable variables
t_vars = tf.trainable_variables()
# create a copy of all trainable variables with `0` as initial values
accum_tvars = [tf.Variable(tf.zeros_like(tv.initialized_value()),trainable=False) for t_var in t_vars]
# create a op to initialize all accums vars
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_tvars]

# compute gradients for a batch
batch_grads_vars = opt.compute_gradients(loss, t_vars)
# collect the (scaled by const) batch gradient into accumulated vars
accum_ops = [accum_tvars[i].assign_add(tf.scalar_mul(const, batch_grad_var[0]) for i, batch_grad_var in enumerate(batch_grads_vars)]

# apply accums gradients
train_step = opt.apply_gradients([(accum_tvars[i], batch_grad_var[1]) for i, batch_grad_var in enumerate(batch_grads_vars)])
# train_step = opt.apply_gradients(zip(accum_tvars, zip(*batch_grads_vars)[1])

while True:
# initialize the accumulated gards
sess.run(zero_ops)

# number of batches for gradient accumulation
n_batches = 3
for i in xrange(n_batches):
sess.run(accum_ops, feed_dict={X: x_init[:, i]})

sess.run(train_step)

关于machine-learning - 几个批处理的 TensorFlow 平均梯度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45987156/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com