gpt4 book ai didi

python - 如何使用 LSTM 在 python 中进行序列标记?

转载 作者:行者123 更新时间:2023-11-28 19:15:30 25 4
gpt4 key购买 nike

我想构建一个分类器,在给定向量时间序列的情况下提供标签。我有基于 LSTM 的静态分类器的代码,但我不知道如何合并时间信息:

训练集:

time   = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11,12,13,14,15,16,17,18]
f1 = [1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2]
f2 = [2, 1, 3, 2, 4, 2, 3, 1, 9, 2, 1, 2, 1, 6, 1, 8, 2, 2]
labels = [a, a, b, b, a, a, b, b, a, a, b, b, a, a, b, b, a, a]

测试集:

time   = [1, 2, 3, 4, 5, 6]
f1 = [2, 2, 2, 1, 1, 1]
f2 = [2, 1, 2, 1, 6, 1]
labels = [?, ?, ?, ?, ?, ?]

正在关注 this post ,我在 pybrain 中实现了以下内容:

from pybrain.datasets import SequentialDataSet
from itertools import cycle
import matplotlib.pyplot as plt
from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure.modules import LSTMLayer
from pybrain.supervised import RPropMinusTrainer
from sys import stdout

data = [1,2,3,4,5,6,7]

ds = SequentialDataSet(1, 1)
for sample, next_sample in zip(data, cycle(data[1:])):
ds.addSample(sample, next_sample)

print ds
net = buildNetwork(2, 5, 1, hiddenclass=LSTMLayer, outputbias=False, recurrent=True)


trainer = RPropMinusTrainer(net, dataset=ds)
train_errors = [] # save errors for plotting later
EPOCHS_PER_CYCLE = 5
CYCLES = 100
EPOCHS = EPOCHS_PER_CYCLE * CYCLES
for i in xrange(CYCLES):
trainer.trainEpochs(EPOCHS_PER_CYCLE)
train_errors.append(trainer.testOnData())
epoch = (i+1) * EPOCHS_PER_CYCLE
print("\r epoch {}/{}".format(epoch, EPOCHS))
stdout.flush()

print()
print("final error =", train_errors[-1])

plt.plot(range(0, EPOCHS, EPOCHS_PER_CYCLE), train_errors)
plt.xlabel('epoch')
plt.ylabel('error')
plt.show()

for sample, target in ds.getSequenceIterator(0):
print(" sample = %4.1f" % sample)
print("predicted next sample = %4.1f" % net.activate(sample))
print(" actual next sample = %4.1f" % target)
print()

这训练了一个分类器,但我不知道如何合并时间信息。如何包含有关向量顺序的信息?

最佳答案

这就是我实现序列标记的方式。我有六类标签。每个类(class)我有 20 个样本序列。每个序列由 100 个时间步长的数据点和 10 个变量组成。

input_variable = 10
output_class = 1
trndata = SequenceClassificationDataSet(input_variable,output_label, nb_classes=6)

# input 1st sequence into dataset for class label 0
for i in range(100):
trndata.appendLinked(sequence1_class0[i,:], [0])
trndata.newSequence()

# input 2nd sequence into dataset for class label 0
for i in range(100):
trndata.appendLinked(sequence2_class0[i,:], [0])
trndata.newSequence()
......
......

# input 20th sequence into dataset for class label 5
for i in range(100):
trndata.appendLinked(sequence20_class5[i,:], [5])
trndata.newSequence()

您最终可以将它们全部放入一个 for 循环中。每次将新的样本序列作为数据集给出时,都会调用 trndata.newSequence()。

网络的训练应该类似于您现有的代码。

关于python - 如何使用 LSTM 在 python 中进行序列标记?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33884135/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com