tensorflow - 使用 LSTM 进行多元二进制序列预测-6ren

tensorflow - 使用 LSTM 进行多元二进制序列预测

转载作者：行者123 更新时间：2023-12-04 01:49:21

我正在研究序列预测问题，但我在这方面没有太多经验，所以下面的一些问题可能很幼稚。

仅供引用:我创建了一个关注 CRF 的后续问题 here

我有以下问题:

我想预测多个非独立变量的二进制序列。

输入:

我有一个包含以下变量的数据集:

时间戳

A组和B组

对应于特定时间戳的每个组的二进制信号

此外，假设以下情况:

我们可以从时间戳(例如一天中的小时)中提取额外的属性，这些属性可以用作外部预测器

我们认为 A 组和 B 组不是独立的，因此联合建模它们的行为可能是最佳的

binary_signal_group_A和 binary_signal_group_B是我想使用(1)它们过去的行为和(2)从每个时间戳中提取的附加信息来预测的 2 个非独立变量。

到目前为止我做了什么:

# required libraries
import re
import numpy as np
import pandas as pd
from keras import Sequential
from keras.layers import LSTM

data_length = 18  # how long our data series will be
shift_length = 3  # how long of a sequence do we want

df = (pd.DataFrame  # create a sample dataframe
    .from_records(np.random.randint(2, size=[data_length, 3]))
    .rename(columns={0:'a', 1:'b', 2:'extra'}))
# NOTE: the 'extra' variable refers to a generic predictor such as for example 'is_weekend' indicator, it doesn't really matter what it is

# shift so that our sequences are in rows (assuming data is sorted already)
colrange = df.columns
shift_range = [_ for _ in range(-shift_length, shift_length+1) if _ != 0]
for c in colrange:
    for s in shift_range:
        if not (c == 'extra' and s > 0):
            charge = 'next' if s > 0 else 'last'  # 'next' variables is what we want to predict
            formatted_s = '{0:02d}'.format(abs(s))
            new_var = '{var}_{charge}_{n}'.format(var=c, charge=charge, n=formatted_s)
            df[new_var] = df[c].shift(s)

# drop unnecessary variables and trim missings generated by the shift operation
df.dropna(axis=0, inplace=True)
df.drop(colrange, axis=1, inplace=True)
df = df.astype(int)
df.head()  # check it out

#   a_last_03  a_last_02      ...        extra_last_02  extra_last_01
# 3          0          1      ...                    0              1
# 4          1          0      ...                    0              0
# 5          0          1      ...                    1              0
# 6          0          0      ...                    0              1
# 7          0          0      ...                    1              0
# [5 rows x 15 columns]

# separate predictors and response
response_df_dict = {}
for g in ['a','b']:
    response_df_dict[g] = df[[c for c in df.columns if 'next' in c and g in c]]

# reformat for LSTM
# the response for every row is a matrix with depth of 2 (the number of groups) and width = shift_length
# the predictors are of the same dimensions except the depth is not 2 but the number of predictors that we have

response_array_list = []
col_prefix = set([re.sub('_\d+$','',c) for c in df.columns if 'next' not in c])
for c in col_prefix:
    current_array = df[[z for z in df.columns if z.startswith(c)]].values
    response_array_list.append(current_array)

# reshape into samples (1), time stamps (2) and channels/variables (0)
response_array = np.array([response_df_dict['a'].values,response_df_dict['b'].values])
response_array = np.reshape(response_array, (response_array.shape[1], response_array.shape[2], response_array.shape[0]))
predictor_array = np.array(response_array_list)
predictor_array = np.reshape(predictor_array, (predictor_array.shape[1], predictor_array.shape[2], predictor_array.shape[0]))

# feed into the model
model = Sequential()
model.add(LSTM(8, input_shape=(predictor_array.shape[1],predictor_array.shape[2]), return_sequences=True))  # the number of neurons here can be anything
model.add(LSTM(2, return_sequences=True))  # should I use an activation function here? the number of neurons here must be equal to the # of groups we are predicting
model.summary()

# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# lstm_62 (LSTM)               (None, 3, 8)              384       
# _________________________________________________________________
# lstm_63 (LSTM)               (None, 3, 2)              88        
# =================================================================
# Total params: 472
# Trainable params: 472
# Non-trainable params: 0

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])  # is it valid to use crossentropy and accuracy as metric?
model.fit(predictor_array, response_array, epochs=10, batch_size=1)
model_preds = model.predict_classes(predictor_array)  # not gonna worry about train/test split here
model_preds.shape  # should return (12, 3, 2) or (# of records, # of timestamps, # of groups which are a and b)

# (12, 3)

model_preds
# array([[1, 0, 0],
#        [0, 0, 0],
#        [1, 0, 0],
#        [0, 0, 0],
#        [1, 0, 0],
#        [0, 0, 0],
#        [0, 0, 0],
#        [0, 0, 0],
#        [0, 0, 0],
#        [0, 0, 0],
#        [1, 0, 0],
#        [0, 0, 0]])

问题:

这里的主要问题是: 我如何让它工作，以便模型预测两组的下 N 个序列？

另外，我想问以下几个问题:

预计 A 组和 B 组是互相关的，但是，尝试通过单个模型输出 A 和 B 序列是否有效，或者我应该拟合 2 个单独的模型，一个预测 A，另一个预测 B，但都使用历史 A 和 B 数据作为输入？

虽然我在模型中的最后一层是形状为 (None, 3, 2) 的 LSTM，但预测输出的形状为 (12, 3)，而我预期它为 (12, 2) -- 我在做什么错在这里，如果是这样，我将如何解决这个问题？

就输出LSTM层而言，这里使用激活函数是不是一个好主意，比如sigmoid？为什么/为什么不？

使用分类类型损失(二进制交叉熵)和度量(准确性)来优化序列是否有效？

LSTM 模型是这里的最佳选择吗？有没有人认为 CRF 或一些 HMM 类型的模型在这里工作得更好？

非常感谢!

最佳答案

我会依次回答所有问题

how do I get this working so that the model would forecast the next N sequences for both groups?

我建议对您的模型进行两次修改。
第一在最后一层使用 sigmoid 激活。

为什么？？ 考虑二元交叉熵损失函数(我从 here 借用了方程)
$L=-y\ln(p) -(1-y)\ln(1-p)$
哪里 L是计算损失， p是网络预测和 y是目标值。

损失定义为 $p\in(0,1)$ .
如果 p 在这个开放区间范围之外，则损失是不确定的。 keras is tanh中lstm层的默认激活它的输出范围是 (-1, 1)。这意味着模型的输出不适合二元交叉熵损失。如果您尝试训练模型，您最终可能会得到 nan为损失。

第二个修改(是第一次修改的一部分)要么在最后一层之前添加 sigmoid 激活。为此，您有三个选择。

在输出和最后一个 lstm 层之间添加具有 sigmoid 激活的密集层。

或者把lstm层的激活改为sigmoid。

或者在输出层之后添加带有 sigmoid 激活的激活层。

即使所有情况都可以工作，我还是建议使用带有 sigmoid 激活的密集层，因为它几乎总是效果更好。
现在具有建议更改的模型将是

model = Sequential()
model.add(LSTM(8, input_shape=(predictor_array.shape[1],predictor_array.shape[2]), return_sequences=True))  
model.add(LSTM(2, return_sequences=True)) 
model.add(TimeDistributed(Dense(2, activation="sigmoid")))
model.summary()

... is it valid to attempt to output both A and B sequences by a single model or should I fit 2 separate models ... ?

理想情况下，这两种情况都可以工作。但是最新的研究是这样的 this one表明前一种情况(您对两组使用单个模型)往往表现更好。该方法一般称为 Multi Task Learning .背后的理念 多任务学习 非常广泛，为简单起见，它可以被认为是通过强制模型学习多个任务常见的隐藏表示来增加归纳偏差。

... the prediction output is of shape (12, 3) when I would have expected it to be (12, 2) -- am I doing something wrong here ... ?

你得到这个是因为你正在使用 predict_classes方法。与 predict 方法不同， predict_classes 方法返回 channel 轴的最大索引(在您的情况下为第三个索引)。正如我上面所解释的，如果您对最后一层使用 sigmoid 激活并将 predict_classes 替换为 predict，您将得到您所期望的。

As far as the output LSTM layer is concerned, would it be a good idea to use an activation function here, such as sigmoid? Why/why not?

我希望我已经在上面解释了这一点。答案是肯定的。

Is it valid to use a classification type loss (binary cross-entropy) and metrics (accuracy) for optimizing a sequence?

由于您的目标是二进制信号(分布为 Bernoulli distribution)，是的，使用二进制损失和准确度指标是有效的。 This answer gives关于为什么二元交叉熵对此类目标变量有效的更多详细信息。

Is an LSTM model an optimal choice here? Does anyone think that a CRF or some HMM-type model would work better here?

这取决于可用数据和您选择的网络的复杂性。 CRF 和 HMM 网络很简单，如果可用数据很少，效果会更好。但是如果可用数据集很大，LSTM 几乎总是优于 CRF 和 HMM。我的建议是，如果您有大量数据，请使用 LSTM。但是如果你有小数据或寻找简单的模型，你可以使用 CRF 或 HMM。

关于tensorflow - 使用 LSTM 进行多元二进制序列预测，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53977695/

文章推荐： python - 如何附加 numpy 数组？

文章推荐：面向数据设计中的接口(interface)

文章推荐： python - xlsxwriter 格式化特定的单元格

文章推荐： bash - 当我中断 bash 脚本时试图关闭所有子进程

lstm - LSTM 单元如何映射到层？
我无法准确理解 LSTM 单元的范围——它如何映射到网络层。来自格雷夫斯 (2014): 在我看来，在单层网络中，layer = lstm 单元。这实际上如何在多层 rnn 中工作？三层RNN LS
machine-learning - lstm(256) + lstm(256) 和 lstm(512) 有什么区别？
这是代码 model = Sequential() model.add(LSTM(256, input_shape=(None, 1), return_sequences=True)) model.a
lstm - Pytorch 隐藏状态 LSTM
为什么我们需要在pytorch中初始化LSTM中的隐藏状态h0。由于 h0 无论如何都会被计算并被覆盖？是不是很像整合一个一 = 0 一个= 4 即使我们不做a=0，也应该没问题.. 最佳答案重点
lstm - Deeplearning4j LSTM 时间序列预测示例
我正在尝试使用 LSTM 在 Deeplearning4j 中进行一些简单的时间序列预测，但我很难让它工作。我有一个简单的文本文件，其中包含如下所示的数字列表，并希望网络学习预测下一个数字。有没有
keras - 对如何实现时间分布的 LSTM + LSTM 感到困惑
在大量阅读和绘制图表之后，我想我已经提出了一个模型，我可以将其用作更多测试我需要调整哪些参数和功能的基础。但是，我对如何实现以下测试用例感到困惑(所有数字都比最终模型小几个数量级，但我想从小处着手):
lstm - torch.nn.LSTM 运行时错误
我正在尝试实现“Livelinet:用于预测教育视频中的活力的多模式深度循环神经网络”中的结构。为了简单说明，我将 10 秒音频剪辑分成 10 个 1 秒音频剪辑，并从该 1 秒音频剪辑中获取频谱图
python - Tensorflow LSTM - LSTM 单元上的矩阵乘法
我正在 Tensorflow 中制作 LSTM 神经网络。输入张量大小为 92。 import tensorflow as tf from tensorflow.contrib import rnn
python - 在 LSTM 层之前具有嵌入层的 Keras LSTM
我正在尝试 keras IMDB 数据的示例，数据形状是这样的: x_train shape: (25000, 80) 我只是把keras例子的原始代码改成了这样的代码: model = Sequen
lstm - 如何正确地为 PyTorch 中的嵌入、LSTM 和线性层提供输入？
我需要了解如何使用 torch.nn 的不同组件正确准备批量训练的输入。模块。具体来说，我希望为 seq2seq 模型创建一个编码器-解码器网络。假设我有一个包含这三层的模块，按顺序: nn.Emb
tensorflow - Keras - 有状态 LSTM 与无状态 LSTM
我很难概念化 Keras 中有状态 LSTM 和无状态 LSTM 之间的区别。我的理解是，在每个批处理结束时，在无状态情况下“网络状态被重置”，而对于有状态情况，网络状态会为每个批处理保留，然后必须在
lstm - PyTorch LSTM - 使用词嵌入而不是 nn.Embedding()
nn.Embedding() 是学习 LSTM 所必需的吗？我在 PyTorch 中使用 LSTM 来预测 NER - 此处是类似任务的示例 - https://pytorch.org/tutori
python - 塑造 LSTM 的数据，并将密集层的输出馈送到 LSTM
我正在尝试找出适合我想要拟合的模型的正确语法。这是一个时间序列预测问题，我想在将时间序列输入 LSTM 之前使用一些密集层来改进时间序列的表示。这是我正在使用的虚拟系列: import pandas
deep-learning - 堆叠式 LSTM 网络中每个 LSTM 层的输入是什么？
我在理解堆叠式 LSTM 网络中各层的输入-输出流时遇到了一些困难。假设我已经创建了一个如下所示的堆叠式 LSTM 网络: # parameters time_steps = 10 features
lstm - 将 LSTM 中的 Tanh 激活更改为 ReLU
LSTM 类中的默认非线性激活函数是 tanh。我希望在我的项目中使用 ReLU。浏览文档和其他资源，我无法找到一种简单的方法来做到这一点。我能找到的唯一方法是定义我自己的自定义 LSTMCell，但
lstm - 是否可以在 PyTorch 中使用 LSTMCells 模块实现多层 LSTM？
在 PyTorch 中，有一个 LSTM 模块，除了输入序列、隐藏状态和单元状态之外，它还接受 num_layers 参数，该参数指定我们的 LSTM 有多少层。然而，还有另一个模块 LSTMCel
machine-learning - TensorFlow:在另一个 LSTM 之上的 LSTM
没什么好说的作为介绍:我想在 TensorFlow 中将 LSTM 堆叠在另一个 LSTM 上，但一直被错误阻止，我不太明白，更不用说单独解决了。代码如下: def RNN(_X, _istate,
machine-learning - 双向 LSTM 和 LSTM 有什么区别？
有人可以解释一下吗？我知道双向 LSTM 具有前向和反向传递，但是与单向 LSTM 相比，它有什么优势？它们各自更适合什么？最佳答案 LSTM 的核心是使用隐藏状态保留已经通过它的输入信息。单向
python - LSTM 内的 Tensorflow 序列到序列 LSTM(嵌套)
我想构建一个带有特殊词嵌入的 LSTM，但我对它的工作原理有一些疑问。您可能知道，一些 LSTM 对字符进行操作，因此它是字符输入，字符输出。我想做同样的事情，通过对单词的抽象来学习使用嵌套的 LS
Keras LSTM for continuous output and with EarlyStopping(用于连续输出和早期停止的KERAS LSTM)
我编写了一个LSTM回归模型。它是最后一个LSTM层的BATCH_SIZE=1和RETURN_Sequence=True的模型。我还设置了VERIFICATION_DATA和耐心进行培训。但似乎存在一
python - TensorFlow:为下一批记住 LSTM 状态(有状态 LSTM)
给定一个训练有素的 LSTM 模型，我想对单个时间步执行推理，即以下示例中的 seq_length = 1。在每个时间步之后，需要为下一个“批处理”记住内部 LSTM(内存和隐藏)状态。在推理的最开始

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

tensorflow - 使用 LSTM 进行多元二进制序列预测