gpt4 book ai didi

tensorflow - 序列到序列 - 用于时间序列预测

转载 作者:行者123 更新时间:2023-12-04 04:28:58 25 4
gpt4 key购买 nike

我试图建立一个序列到序列模型,以根据前几个输入预测传感器信号随时间的变化(见下图)
enter image description here

该模型工作正常,但我想“增加趣味”并尝试在两个 LSTM 层之间添加一个注意力层。

型号代码:

def train_model(x_train, y_train, n_units=32, n_steps=20, epochs=200,
n_steps_out=1):

filters = 250
kernel_size = 3

logdir = os.path.join(logs_base_dir, datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = TensorBoard(log_dir=logdir, update_freq=1)

# get number of features from input data
n_features = x_train.shape[2]
# setup network
# (feel free to use other combination of layers and parameters here)
model = keras.models.Sequential()
model.add(keras.layers.LSTM(n_units, activation='relu',
return_sequences=True,
input_shape=(n_steps, n_features)))
model.add(keras.layers.LSTM(n_units, activation='relu'))
model.add(keras.layers.Dense(64, activation='relu'))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(n_steps_out))
model.compile(optimizer='adam', loss='mse', metrics=['mse'])
# train network
history = model.fit(x_train, y_train, epochs=epochs,
validation_split=0.1, verbose=1, callbacks=[tensorboard_callback])
return model, history

我看过 documentation但我有点失落。任何帮助添加关注层或对当前模型的评论将不胜感激

更新:
在谷歌搜索之后,我开始认为我错了,我重写了我的代码。

我正在尝试迁移我在此 GitHub repository 中找到的 seq2seq 模型.在存储库代码中,演示的问题是根据一些早期样本预测随机生成的正弦波。

我有一个类似的问题,我正在尝试更改代码以满足我的需要。

区别:
  • 我的训练数据形状是 (439, 5, 20) 439 个不同的信号,5 个时间步,每个有 20 个特征
  • 我没有使用 fit_generator拟合我的数据时


  • 超参数:
    layers = [35, 35] # Number of hidden neuros in each layer of the encoder and decoder

    learning_rate = 0.01
    decay = 0 # Learning rate decay
    optimiser = keras.optimizers.Adam(lr=learning_rate, decay=decay) # Other possible optimiser "sgd" (Stochastic Gradient Descent)

    num_input_features = train_x.shape[2] # The dimensionality of the input at each time step. In this case a 1D signal.
    num_output_features = 1 # The dimensionality of the output at each time step. In this case a 1D signal.
    # There is no reason for the input sequence to be of same dimension as the ouput sequence.
    # For instance, using 3 input signals: consumer confidence, inflation and house prices to predict the future house prices.

    loss = "mse" # Other loss functions are possible, see Keras documentation.

    # Regularisation isn't really needed for this application
    lambda_regulariser = 0.000001 # Will not be used if regulariser is None
    regulariser = None # Possible regulariser: keras.regularizers.l2(lambda_regulariser)

    batch_size = 128
    steps_per_epoch = 200 # batch_size * steps_per_epoch = total number of training examples
    epochs = 100

    input_sequence_length = n_steps # Length of the sequence used by the encoder
    target_sequence_length = 31 - n_steps # Length of the sequence predicted by the decoder
    num_steps_to_predict = 20 # Length to use when testing the model

    编码器代码:
    # Define an input sequence.

    encoder_inputs = keras.layers.Input(shape=(None, num_input_features), name='encoder_input')

    # Create a list of RNN Cells, these are then concatenated into a single layer
    # with the RNN layer.
    encoder_cells = []
    for hidden_neurons in layers:
    encoder_cells.append(keras.layers.GRUCell(hidden_neurons,
    kernel_regularizer=regulariser,
    recurrent_regularizer=regulariser,
    bias_regularizer=regulariser))

    encoder = keras.layers.RNN(encoder_cells, return_state=True, name='encoder_layer')

    encoder_outputs_and_states = encoder(encoder_inputs)

    # Discard encoder outputs and only keep the states.
    # The outputs are of no interest to us, the encoder's
    # job is to create a state describing the input sequence.
    encoder_states = encoder_outputs_and_states[1:]

    解码器代码:
    # The decoder input will be set to zero (see random_sine function of the utils module).
    # Do not worry about the input size being 1, I will explain that in the next cell.
    decoder_inputs = keras.layers.Input(shape=(None, 20), name='decoder_input')

    decoder_cells = []
    for hidden_neurons in layers:
    decoder_cells.append(keras.layers.GRUCell(hidden_neurons,
    kernel_regularizer=regulariser,
    recurrent_regularizer=regulariser,
    bias_regularizer=regulariser))

    decoder = keras.layers.RNN(decoder_cells, return_sequences=True, return_state=True, name='decoder_layer')

    # Set the initial state of the decoder to be the ouput state of the encoder.
    # This is the fundamental part of the encoder-decoder.
    decoder_outputs_and_states = decoder(decoder_inputs, initial_state=encoder_states)

    # Only select the output of the decoder (not the states)
    decoder_outputs = decoder_outputs_and_states[0]

    # Apply a dense layer with linear activation to set output to correct dimension
    # and scale (tanh is default activation for GRU in Keras, our output sine function can be larger then 1)
    decoder_dense = keras.layers.Dense(num_output_features,
    activation='linear',
    kernel_regularizer=regulariser,
    bias_regularizer=regulariser)

    decoder_outputs = decoder_dense(decoder_outputs)

    型号概要:
    model = keras.models.Model(inputs=[encoder_inputs, decoder_inputs], 
    outputs=decoder_outputs)
    model.compile(optimizer=optimiser, loss=loss)
    model.summary()
    Layer (type)                    Output Shape         Param #     Connected to                     
    ==================================================================================================
    encoder_input (InputLayer) (None, None, 20) 0
    __________________________________________________________________________________________________
    decoder_input (InputLayer) (None, None, 20) 0
    __________________________________________________________________________________________________
    encoder_layer (RNN) [(None, 35), (None, 13335 encoder_input[0][0]
    __________________________________________________________________________________________________
    decoder_layer (RNN) [(None, None, 35), ( 13335 decoder_input[0][0]
    encoder_layer[0][1]
    encoder_layer[0][2]
    __________________________________________________________________________________________________
    dense_5 (Dense) (None, None, 1) 36 decoder_layer[0][0]
    ==================================================================================================
    Total params: 26,706
    Trainable params: 26,706
    Non-trainable params: 0
    __________________________________________________________________________________________________

    尝试拟合模型时:
    history = model.fit([train_x, decoder_inputs],train_y, epochs=epochs,
    validation_split=0.3, verbose=1)

    我收到以下错误:
    When feeding symbolic tensors to a model, we expect the tensors to have a static batch size. Got tensor with shape: (None, None, 20)

    我究竟做错了什么?

    最佳答案

    Keras 中的注意力层不是可训练层(除非我们使用 scale 参数)。它只计算矩阵运算。在我看来,如果直接应用于时间序列,这一层可能会导致一些错误,但让我们继续按顺序进行……

    在我们的时间序列问题上复制注意力机制的最自然选择是采用提出的解决方案 here并再次解释here .这是注意力在 NLP 中 enc-dec 结构中的经典应用

    在 TF 实现之后,对于我们的注意力层,我们需要 3d 格式的查询、值、键张量。我们直接从循环层获得这些值。更具体地说,我们利用序列输出和隐藏状态。这些就是我们构建注意力机制所需的全部内容。

    查询是输出序列[batch_dim, time_step, features]

    值是隐藏状态 [batch_dim, features],其中我们为矩阵操作添加了时间维度 [batch_dim, 1, features]

    作为键,我们像以前一样使用隐藏状态所以键=值

    在上面的定义和实现中我发现了两个问题:

  • 分数是用 softmax(dot(sequence, hidden)) 计算的。点是可以的,但是在 Keras 实现之后的 softmax 是在最后一个维度上计算的,而不是在时间维度上计算的。这意味着分数全部为 1,因此它们是无用的
  • 输出注意力是点(分数,隐藏)而不是点(分数,序列),因为我们需要

  • 这个例子:
    def attention_keras(query_value):

    query, value = query_value # key == value
    score = tf.matmul(query, value, transpose_b=True) # (batch, timestamp, 1)
    score = tf.nn.softmax(score) # softmax on -1 axis ==> score always = 1 !!!
    print((score.numpy()!=1).any()) # False ==> score always = 1 !!!
    score = tf.matmul(score, value) # (batch, timestamp, feat)
    return score

    np.random.seed(33)
    time_steps = 20
    features = 50
    sample = 5

    X = np.random.uniform(0,5, (sample,time_steps,features))
    state = np.random.uniform(0,5, (sample,features))
    attention_keras([X,tf.expand_dims(state,1)]) # ==> the same as Attention(dtype='float64')([X,tf.expand_dims(state,1)])

    所以出于这个原因,为了时间序列的关注,我提出了这个解决方案
    def attention_seq(query_value, scale):

    query, value = query_value
    score = tf.matmul(query, value, transpose_b=True) # (batch, timestamp, 1)
    score = scale*score # scale with a fixed number (it can be finetuned or learned during train)
    score = tf.nn.softmax(score, axis=1) # softmax on timestamp axis
    score = score*query # (batch, timestamp, feat)
    return score

    np.random.seed(33)
    time_steps = 20
    features = 50
    sample = 5

    X = np.random.uniform(0,5, (sample,time_steps,features))
    state = np.random.uniform(0,5, (sample,features))
    attention_seq([X,tf.expand_dims(state,1)], scale=0.05)

    查询是输出序列[batch_dim, time_step, features]

    值是隐藏状态 [batch_dim, features],其中我们为矩阵操作添加了时间维度 [batch_dim, 1, features]

    权重使用 softmax(scale*dot(sequence, hidden)) 计算。 scale 参数是一个标量值,可用于在应用 softmax 操作之前缩放权重。 softmax 在时间维度上计算正确。注意输出是输入序列和分数的加权乘积。我使用标量参数作为固定值,但它可以调整或作为可学习的权重插入自定义层(作为 Keras attention 中的比例参数)。

    在网络实现方面,有两种可用的可能性:
    ######### KERAS #########
    inp = Input((time_steps,features))
    seq, state = GRU(32, return_state=True, return_sequences=True)(inp)
    att = Attention()([seq, tf.expand_dims(state,1)])

    ######### CUSTOM #########
    inp = Input((time_steps,features))
    seq, state = GRU(32, return_state=True, return_sequences=True)(inp)
    att = Lambda(attention_seq, arguments={'scale': 0.05})([seq, tf.expand_dims(state,1)])

    结语

    我不知道在简单问题中引入注意力层能带来多少附加值。如果您有短序列,我建议您保持原样。我在这里报告的是我表达我的考虑的答案,我会接受关于可能的错误或误解的评论或考虑

    在您的模型中,这些解决方案可以通过这种方式嵌入
    ######### KERAS #########
    inp = Input((n_features, n_steps))
    seq, state = GRU(n_units, activation='relu',
    return_state=True, return_sequences=True)(inp)
    att = Attention()([seq, tf.expand_dims(state,1)])
    x = GRU(n_units, activation='relu')(att)
    x = Dense(64, activation='relu')(x)
    x = Dropout(0.5)(x)
    out = Dense(n_steps_out)(x)

    model = Model(inp, out)
    model.compile(optimizer='adam', loss='mse', metrics=['mse'])
    model.summary()

    ######### CUSTOM #########
    inp = Input((n_features, n_steps))
    seq, state = GRU(n_units, activation='relu',
    return_state=True, return_sequences=True)(inp)
    att = Lambda(attention_seq, arguments={'scale': 0.05})([seq, tf.expand_dims(state,1)])
    x = GRU(n_units, activation='relu')(att)
    x = Dense(64, activation='relu')(x)
    x = Dropout(0.5)(x)
    out = Dense(n_steps_out)(x)

    model = Model(inp, out)
    model.compile(optimizer='adam', loss='mse', metrics=['mse'])
    model.summary()

    关于tensorflow - 序列到序列 - 用于时间序列预测,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61757475/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com