gpt4 book ai didi

python - GradientTape.gradient的概念理解

转载 作者:行者123 更新时间:2023-12-04 01:32:32 24 4
gpt4 key购买 nike

背景

在 Tensorflow 2 中,存在一个名为 GradientTape 的类。它用于记录对张量的操作,然后可以将其结果微分并馈送到一些最小化算法。例如,from the documentation我们有这个例子:

x = tf.constant(3.0)
with tf.GradientTape() as g:
g.watch(x)
y = x * x
dy_dx = g.gradient(y, x) # Will compute to 6.0

docstringgradient方法意味着第一个参数不仅可以是张量,还可以是张量列表:
 def gradient(self,
target,
sources,
output_gradients=None,
unconnected_gradients=UnconnectedGradients.NONE):
"""Computes the gradient using operations recorded in context of this tape.

Args:
target: a list or nested structure of Tensors or Variables to be
differentiated.
sources: a list or nested structure of Tensors or Variables. `target`
will be differentiated against elements in `sources`.
output_gradients: a list of gradients, one for each element of
target. Defaults to None.
unconnected_gradients: a value which can either hold 'none' or 'zero' and
alters the value which will be returned if the target and sources are
unconnected. The possible values and effects are detailed in
'UnconnectedGradients' and it defaults to 'none'.

Returns:
a list or nested structure of Tensors (or IndexedSlices, or None),
one for each element in `sources`. Returned structure is the same as
the structure of `sources`.

Raises:
RuntimeError: if called inside the context of the tape, or if called more
than once on a non-persistent tape.
ValueError: if the target is a variable or if unconnected gradients is
called with an unknown value.
"""

在上面的例子中,很容易看出 y , target , 是要微分的函数, x是“梯度”的因变量。

从我有限的经验看来, gradient方法返回一个张量列表,每个 sources 的每个元素一个。 ,并且这些梯度中的每一个都是与 sources 的相应成员形状相同的张量.



以上对 gradients行为的描述如果 target 才有意义包含要微分的单个 1x1“张量”,因为在数学上梯度向量应该与函数域具有相同的维度。

但是,如果 target是张量列表,输出 gradients还是一样的形状。为什么会这样?如 target被认为是一个函数列表,输出不应该类似于雅可比行列式吗?我如何从概念上解释这种行为?

最佳答案

就是这样tf.GradientTape().gradient()被定义为。它具有与 tf.gradients() 相同的功能,除了后者不能在 Eager 模式下使用。来自 docstf.gradients() :

It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys



哪里 xssourcesystarget .

示例 1 :

所以让我们说 target = [y1, y2]sources = [x1, x2] .结果将是:
[dy1/dx1 + dy2/dx1, dy1/dx2 + dy2/dx2]

示例 2 :

计算每样本损失(张量)与减少损失(标量)的梯度
Let w, b be two variables. 
xentropy = [y1, y2] # tensor
reduced_xentropy = 0.5 * (y1 + y2) # scalar
grads = [dy1/dw + dy2/dw, dy1/db + dy2/db]
reduced_grads = [d(reduced_xentropy)/dw, d(reduced_xentropy)/db]
= [d(0.5 * (y1 + y2))/dw, d(0.5 * (y1 + y2))/db]
== 0.5 * grads

上述代码段的 Tensorflow 示例:

import tensorflow as tf

print(tf.__version__) # 2.1.0

inputs = tf.convert_to_tensor([[0.1, 0], [0.5, 0.51]]) # two two-dimensional samples
w = tf.Variable(initial_value=inputs)
b = tf.Variable(tf.zeros((2,)))
labels = tf.convert_to_tensor([0, 1])

def forward(inputs, labels, var_list):
w, b = var_list
logits = tf.matmul(inputs, w) + b
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=labels, logits=logits)
return xentropy

# `xentropy` has two elements (gradients of tensor - gradient
# of each sample in a batch)
with tf.GradientTape() as g:
xentropy = forward(inputs, labels, [w, b])
reduced_xentropy = tf.reduce_mean(xentropy)
grads = g.gradient(xentropy, [w, b])
print(xentropy.numpy()) # [0.6881597 0.71584916]
print(grads[0].numpy()) # [[ 0.20586157 -0.20586154]
# [ 0.2607238 -0.26072377]]

# `reduced_xentropy` is scalar (gradients of scalar)
with tf.GradientTape() as g:
xentropy = forward(inputs, labels, [w, b])
reduced_xentropy = tf.reduce_mean(xentropy)
grads_reduced = g.gradient(reduced_xentropy, [w, b])
print(reduced_xentropy.numpy()) # 0.70200443 <-- scalar
print(grads_reduced[0].numpy()) # [[ 0.10293078 -0.10293077]
# [ 0.1303619 -0.13036188]]

如果您为批次中的每个元素计算损失( xentropy ),则每个变量的最终梯度将是批次中每个样本的所有梯度的总和(这是有道理的)。

关于python - GradientTape.gradient的概念理解,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60665006/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com