gpt4 book ai didi

python - 使用 RNN 和 Layer 类在 Keras 中实现最小的 LSTMCell

转载 作者:行者123 更新时间:2023-12-01 15:52:18 26 4
gpt4 key购买 nike

我正在尝试实现一个简单的 LSTMCell,而没有在 tf.keras.layers.LSTMCell 类中默认实现的“fancy kwargs”,遵循类似 this 的原理图模型.它没有真正的直接目的,我只是想练习实现一个比所描述的更复杂的 RNNCell here在示例部分。我的代码如下:

from keras import Input
from keras.layers import Layer, RNN
from keras.models import Model
import keras.backend as K

class CustomLSTMCell(Layer):

def __init__(self, units, **kwargs):
self.state_size = units
super(CustomLSTMCell, self).__init__(**kwargs)

def build(self, input_shape):

self.forget_w = self.add_weight(shape=(self.state_size, self.state_size + input_shape[-1]),
initializer='uniform',
name='forget_w')
self.forget_b = self.add_weight(shape=(self.state_size,),
initializer='uniform',
name='forget_b')

self.input_w1 = self.add_weight(shape=(self.state_size, self.state_size + input_shape[-1]),
initializer='uniform',
name='input_w1')
self.input_b1 = self.add_weight(shape=(self.state_size,),
initializer='uniform',
name='input_b1')
self.input_w2 = self.add_weight(shape=(self.state_size, self.state_size + input_shape[-1]),
initializer='uniform',
name='input_w2')
self.input_b2 = self.add_weight(shape=(self.state_size,),
initializer='uniform',
name='input_b2')

self.output_w = self.add_weight(shape=(self.state_size, self.state_size + input_shape[-1]),
initializer='uniform',
name='output_w')
self.output_b = self.add_weight(shape=(self.state_size,),
initializer='uniform',
name='output_b')

self.built = True

def merge_with_state(self, inputs):
self.stateH = K.concatenate([self.stateH, inputs], axis=-1)

def forget_gate(self):
forget = K.dot(self.forget_w, self.stateH) + self.forget_b
forget = K.sigmoid(forget)
self.stateC = self.stateC * forget

def input_gate(self):
candidate = K.dot(self.input_w1, self.stateH) + self.input_b1
candidate = K.tanh(candidate)

amount = K.dot(self.input_w2, self.stateH) + self.input_b2
amount = K.tanh(amount)

self.stateC = self.stateC + amount * candidate

def output_gate(self):
self.stateH = K.dot(self.output_w, self.stateH) + self.output_b
self.stateH = K.sigmoid(self.stateH)

self.stateH = self.stateH * K.tanh(self.stateC)

def call(self, inputs, states):

self.stateH = states[0]
self.stateC = states[1]

self.merge_with_state(inputs)
self.forget_gate()
self.input_gate()
self.output_gate()

return self.stateH, [self.stateH, self.stateC]

# Testing
inp = Input(shape=(None, 3))
lstm = RNN(CustomLSTMCell(10))(inp)

model = Model(inputs=inp, outputs=lstm)
inp_value = [[[[1,2,3], [2,3,4], [3,4,5]]]]
pred = model.predict(inp_value)
print(pred)

但是,当我尝试对其进行测试时,引发了以下消息的异常:
IndexError: tuple index out of range

call在我为 self.stateC 设置值的那一行运行.在这里,我认为最初是 states call的论据函数是张量而不是张量列表,所以这就是我收到错误的原因。所以我加了一个 self.already_called = False线上课 __init__和以下部分到 call功能:
 if not self.already_called:
self.stateH = K.ones(self.state_size)
self.stateC = K.ones(self.state_size)
self.already_called = True
else:
self.stateH = states[0]
self.stateC = states[1]

希望它能消除这个问题。这导致了 merge_with_state 处的另一个错误。功能:
 ValueError: Shape must be rank 1 but is rank 2 for 'rnn_1/concat' (op: 'ConcatV2') with input shapes: [10], [?,3], [].

我真的不明白,因为 RNN 层应该只“显示”形状为 (3) 而不是 (None, 3) 的 CustomLSTMCell 张量,因为轴 0 是它应该沿其迭代的轴。在这一点上,我确信我做错了什么,应该向社区寻求帮助。基本上我的问题是:我的代码有什么问题,如果“几乎所有东西”,那么我应该如何从头开始实现 LSTMCell?

最佳答案

好的,看来我设法解决了这个问题。事实证明,阅读文档总是有用的,在这种情况下是 docs for the RNN class .一、already_called属性是不必要的,因为问题出在__init__的第一行功能:state_size属性应该是一个整数列表而不仅仅是一个整数,像这样:self.state_size = [units, units] (因为对于大小为 units 的 LSTM,我们需要两个状态而不是一个)。当我更正它时,我得到了一个不同的错误:张量在 forget_gate 中的维度不兼容。为补充。发生这种情况是因为 RNN 一次看到整个批次,而不是单独看到批次中的每个元素(因此 None 轴 0 处的形状)。对其的修正是添加一个额外的维度 到每个张量 在轴 0 处大小为 1,如下所示:

 self.forget_w = self.add_weight(shape=(1, self.state_size, self.state_size + input_shape[-1]),
initializer='uniform',
name='forget_w')

而不是点积,我不得不使用 K.batch_dot功能。所以整个工作代码如下:
 from keras import Input
from keras.layers import Layer, RNN
from keras.models import Model
import keras.backend as K

class CustomLSTMCell(Layer):

def __init__(self, units, **kwargs):
self.state_size = [units, units]
super(CustomLSTMCell, self).__init__(**kwargs)

def build(self, input_shape):

self.forget_w = self.add_weight(shape=(1, self.state_size[0], self.state_size[0] + input_shape[-1]),
initializer='uniform',
name='forget_w')
self.forget_b = self.add_weight(shape=(1, self.state_size[0]),
initializer='uniform',
name='forget_b')

self.input_w1 = self.add_weight(shape=(1, self.state_size[0], self.state_size[0] + input_shape[-1]),
initializer='uniform',
name='input_w1')
self.input_b1 = self.add_weight(shape=(1, self.state_size[0]),
initializer='uniform',
name='input_b1')
self.input_w2 = self.add_weight(shape=(1, self.state_size[0], self.state_size[0] + input_shape[-1]),
initializer='uniform',
name='input_w2')
self.input_b2 = self.add_weight(shape=(1, self.state_size[0],),
initializer='uniform',
name='input_b2')

self.output_w = self.add_weight(shape=(1, self.state_size[0], self.state_size[0] + input_shape[-1]),
initializer='uniform',
name='output_w')
self.output_b = self.add_weight(shape=(1, self.state_size[0],),
initializer='uniform',
name='output_b')

self.built = True

def merge_with_state(self, inputs):
self.stateH = K.concatenate([self.stateH, inputs], axis=-1)

def forget_gate(self):
forget = K.batch_dot(self.forget_w, self.stateH) + self.forget_b
forget = K.sigmoid(forget)
self.stateC = self.stateC * forget

def input_gate(self):
candidate = K.batch_dot(self.input_w1, self.stateH) + self.input_b1
candidate = K.tanh(candidate)

amount = K.batch_dot(self.input_w2, self.stateH) + self.input_b2
amount = K.sigmoid(amount)

self.stateC = self.stateC + amount * candidate

def output_gate(self):
self.stateH = K.batch_dot(self.output_w, self.stateH) + self.output_b
self.stateH = K.sigmoid(self.stateH)

self.stateH = self.stateH * K.tanh(self.stateC)

def call(self, inputs, states):

self.stateH = states[0]
self.stateC = states[1]

self.merge_with_state(inputs)
self.forget_gate()
self.input_gate()
self.output_gate()

return self.stateH, [self.stateH, self.stateC]

inp = Input(shape=(None, 3))
lstm = RNN(CustomLSTMCell(10))(inp)

model = Model(inputs=inp, outputs=lstm)
inp_value = [[[[1,2,3], [2,3,4], [3,4,5]]]]
pred = model.predict(inp_value)
print(pred)

编辑:在问题中,我对链接的模型犯了一个错误,并在 input_gate 中使用了 tanh 函数。为 amount而不是 sigmoid。在这里,我在代码中对其进行了编辑,因此现在是正确的。

关于python - 使用 RNN 和 Layer 类在 Keras 中实现最小的 LSTMCell,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60185290/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com