gpt4 book ai didi

python - tensorflow 中的 apply_gradients() 函数不会更新权重和偏差变量

转载 作者:行者123 更新时间:2023-11-30 08:37:51 24 4
gpt4 key购买 nike

我使用 Tensorflow 的 compute_gradients()apply_gradients() 函数进行反向传播。通过打印梯度值,我确实看到梯度正在计算,但在调用 apply_gradients() 函数后,我没有看到权重有任何变化。我也没有看到 global_step 变量的值发生变化。

我做错了什么吗?

我在 session 中运行以下代码,并且确实看到从 compute_gradients() 函数返回的梯度值被打印。但是,当我将(梯度,权重变量)元组列表传递给 apply_gradients() 函数时,我没有看到权重值发生变化,并且 global_step 值没有更新。

global_step = tf.Variable(0, trainable=False, dtype=tf.int32)
images = tf.placeholder(dtype=tf.float32, shape=[batch_size, None, None, 3])
out_locs = tf.placeholder(dtype=tf.float32, shape=[None, 2])
org_gt_coords = tf.placeholder(dtype=tf.float32, shape=[batch_size, 2])

res_aux = inference(images,out_locs,org_gt_coords)

ret_dict = train(res_aux, global_step)

init = tf.global_variables_initializer()
with tf.Session() as sess:
writer = tf.summary.FileWriter('./graphs', sess.graph)
sess.run(init)

for epoch in xrange(max_steps):
start_time = time.time()
anno_file_batch_rows = getImageMetaRecords()
print('epoch: ', epoch)

for batch in xrange(len(anno_file_batch_rows)/batch_size):
distorted_images, meta = cdhd_input.distorted_inputs(stats_dict, batch_size, \
anno_file_batch_rows[batch * batch_size : (batch * batch_size) + batch_size])

out_dict = sess.run(ret_dict, feed_dict=
{images: distorted_images,
out_locs: meta['out_locs'],
org_gt_coords: meta['org_gt_coords']})

def inference(images,out_locs,org_gt_coords):
# conv1
with tf.variable_scope('conv1') as scope:
kernel = _variable_with_weight_decay('weights',
shape=[3, 3, 3, 32],
stddev=1, #check if this is right
wd=0.0)
kernel = tf.multiply(kernel, 0.2722) #line 321-325 in warpTrainCNNCDHDCentroidChainGridPredSharedRevFastExp3
conv = tf.nn.conv2d(images, kernel, [1, 2, 2, 1], padding='VALID')
biases = _variable_on_cpu('biases', [32], tf.constant_initializer(1.0))
pre_activation = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(pre_activation, name=scope.name)

# conv2
with tf.variable_scope('conv2') as scope:
kernel = _variable_with_weight_decay('weights',
shape=[3, 3, 32, 64],
stddev=1,
wd=0.0)
kernel = tf.multiply(kernel, 0.0833) #line 321-325 in warpTrainCNNCDHDCentroidChainGridPredSharedRevFastExp3
conv = tf.nn.conv2d(conv1, kernel, [1, 2, 2, 1], padding='VALID')
biases = _variable_on_cpu('biases', [64], tf.constant_initializer(1.0))
pre_activation = tf.nn.bias_add(conv, biases)
conv2 = tf.nn.relu(pre_activation, name=scope.name)

...
...
more layers
...
...

return res_aux

def train(res_aux, global_step):
...
...
code here to process res_aux and calculate loss
...
...

opt = tf.train.GradientDescentOptimizer(learning_rate=0.01)
grads_and_vars = opt.compute_gradients(loss, tf.get_collection('weights'))
#printing shows real valued gradient and weight values
apply_gradients(grads_and_vars, global_step=global_step)
#printing same weight values shows no change in weight values. Gradients are not applied to the weights

最佳答案

此行仅定义应用渐变的操作:

a_optimizer_col_2.apply_gradients(grad_var_2, global_step=global_step)

为了应用它,您应该在 session 中运行此操作,如下所示:

...
train_step = a_optimizer_col_2.apply_gradients(grad_var_2, global_step=global_step)
...
with tf.Session() as sess:
sess.run(train_step, feed_dict={...})

关于python - tensorflow 中的 apply_gradients() 函数不会更新权重和偏差变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47708375/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com