gpt4 book ai didi

python - tensorflow 概率中的重新参数化 : tf. GradientTape() 不计算相对于分布均值的梯度

转载 作者:行者123 更新时间:2023-11-28 17:58:31 25 4
gpt4 key购买 nike

tensorflow 版本 2.0.0-beta1 中,我正在尝试实现一个 keras 层,它具有从正态随机分布中采样的权重.我想将分布的均值作为可训练参数。

感谢 tensorflow-probability 中已经实现的“重新参数化技巧”,如果我没记错的话,原则上应该可以计算相对于分布均值的梯度。

但是,当我尝试使用 tf.GradientTape() 计算网络输出相对于平均值变量的梯度时,返回的梯度为 None

我创建了两个最小示例,一个具有确定性权重的层和一个具有随机权重的层。确定性层梯度的梯度按预期计算,但在随机层的情况下梯度为 None。没有错误消息详细说明为什么梯度为 None,我有点卡住了。

最小示例代码:

A:这是确定性网络的最小示例:

import tensorflow as tf; print(tf.__version__)

from tensorflow.keras import backend as K
from tensorflow.keras.layers import Layer,Input
from tensorflow.keras.models import Model
from tensorflow.keras.initializers import RandomNormal
import tensorflow_probability as tfp

import numpy as np

# example data
x_data = np.random.rand(99,3).astype(np.float32)

# # A: DETERMINISTIC MODEL

# 1 Define Layer

class deterministic_test_layer(Layer):

def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(deterministic_test_layer, self).__init__(**kwargs)

def build(self, input_shape):
self.kernel = self.add_weight(name='kernel',
shape=(input_shape[1], self.output_dim),
initializer='uniform',
trainable=True)
super(deterministic_test_layer, self).build(input_shape)

def call(self, x):
return K.dot(x, self.kernel)

def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)

# 2 Create model and calculate gradient

x = Input(shape=(3,))
fx = deterministic_test_layer(1)(x)
deterministic_test_model = Model(name='test_deterministic',inputs=[x], outputs=[fx])

print('\n\n\nCalculating gradients for deterministic model: ')

for x_now in np.split(x_data,3):
# print(x_now.shape)
with tf.GradientTape() as tape:
fx_now = deterministic_test_model(x_now)
grads = tape.gradient(
fx_now,
deterministic_test_model.trainable_variables,
)
print('\n',grads,'\n')

print(deterministic_test_model.summary())

B:下面的示例非常相似,但我尝试使用随机采样的权重(在 call() 时间随机采样!)代替确定性权重来测试层:

# # B: RANDOM MODEL

# 1 Define Layer

class random_test_layer(Layer):

def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(random_test_layer, self).__init__(**kwargs)

def build(self, input_shape):
self.mean_W = self.add_weight('mean_W',
initializer=RandomNormal(mean=0.5,stddev=0.1),
trainable=True)

self.kernel_dist = tfp.distributions.MultivariateNormalDiag(loc=self.mean_W,scale_diag=(1.,))
super(random_test_layer, self).build(input_shape)

def call(self, x):
sampled_kernel = self.kernel_dist.sample(sample_shape=x.shape[1])
return K.dot(x, sampled_kernel)

def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_dim)

# 2 Create model and calculate gradient

x = Input(shape=(3,))
fx = random_test_layer(1)(x)
random_test_model = Model(name='test_random',inputs=[x], outputs=[fx])

print('\n\n\nCalculating gradients for random model: ')

for x_now in np.split(x_data,3):
# print(x_now.shape)
with tf.GradientTape() as tape:
fx_now = random_test_model(x_now)
grads = tape.gradient(
fx_now,
random_test_model.trainable_variables,
)
print('\n',grads,'\n')

print(random_test_model.summary())

预期/实际输出:

A:确定性网络按预期工作,并且计算了梯度。输出是:

2.0.0-beta1



Calculating gradients for deterministic model:

[<tf.Tensor: id=26, shape=(3, 1), dtype=float32, numpy=
array([[17.79845 ],
[15.764006 ],
[14.4183035]], dtype=float32)>]


[<tf.Tensor: id=34, shape=(3, 1), dtype=float32, numpy=
array([[16.22232 ],
[17.09122 ],
[16.195663]], dtype=float32)>]


[<tf.Tensor: id=42, shape=(3, 1), dtype=float32, numpy=
array([[16.382954],
[16.074356],
[17.718027]], dtype=float32)>]

Model: "test_deterministic"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 3)] 0
_________________________________________________________________
deterministic_test_layer (de (None, 1) 3
=================================================================
Total params: 3
Trainable params: 3
Non-trainable params: 0
_________________________________________________________________
None

B:然而,在类似随机网络的情况下,梯度没有按预期计算(使用重新参数化技巧)。相反,它们是 None。完整的输出是

Calculating gradients for random model: 

[None]


[None]


[None]

Model: "test_random"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 3)] 0
_________________________________________________________________
random_test_layer (random_te (None, 1) 1
=================================================================
Total params: 1
Trainable params: 1
Non-trainable params: 0
_________________________________________________________________
None

谁能指出我这里的问题?

最佳答案

似乎tfp.distributions.MultivariateNormalDiag就其输入参数而言是不可微分的(例如 loc)。在这种特殊情况下,以下内容是等效的:

class random_test_layer(Layer):
...

def build(self, input_shape):
...
self.kernel_dist = tfp.distributions.MultivariateNormalDiag(loc=0, scale_diag=(1.,))
super(random_test_layer, self).build(input_shape)

def call(self, x):
sampled_kernel = self.kernel_dist.sample(sample_shape=x.shape[1]) + self.mean_W
return K.dot(x, sampled_kernel)

然而,在这种情况下,损失对于 self.mean_W 是可微的。

注意:尽管这种方法可能对您有用,但请注意,调用密度函数 self.kernel_dist.prob 会产生不同的结果,因为我们采用了 loc 外面。

关于python - tensorflow 概率中的重新参数化 : tf. GradientTape() 不计算相对于分布均值的梯度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56936189/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com