gpt4 book ai didi

python - 循环网络 (RNN) 不会学习非常简单的函数(问题中显示的图)

转载 作者:太空狗 更新时间:2023-10-29 18:05:12 25 4
gpt4 key购买 nike

所以我正在尝试训练一个简单的循环网络来检测输入信号中的“突发”。下图显示了 RNN 的输入信号(蓝色)和所需的(分类)输出,以红色显示。

Then end of the sine-shaped input signal burst should be detected.

因此,每当检测到突发时,网络的输出应该从 1 切换到 0,并保持与该输出相同。用于训练 RNN 的输入序列之间唯一发生变化的是爆发发生的时间步长。

遵循 https://github.com/MorvanZhou/PyTorch-Tutorial/blob/master/tutorial-contents/403_RNN_regressor.py 上的教程,我无法让 RNN 学习。学习到的 RNN 始终以“无内存”方式运行,即不使用内存进行预测,如以下示例行为所示:

The same plot as before, but this time with the output behavior of the network.

绿线显示网络的预测输出。 我在这个例子中做错了什么导致无法正确学习网络?网络任务是不是很简单?

我正在使用:

  1. torch.nn.CrossEntropyLoss 作为损失函数
  2. 用于学习的 Adam 优化器
  3. 具有 16 个内部/隐藏节点和 2 个输出节点的 RNN。他们使用 torch.RNN 类的默认激活函数。

该实验已用不同的随机种子重复了几次,但结果几乎没有差异。我使用了以下代码:

import torch
import numpy, math
import matplotlib.pyplot as plt

nofSequences = 5
maxLength = 130

# Generate training data
x_np = numpy.zeros((nofSequences,maxLength,1))
y_np = numpy.zeros((nofSequences,maxLength))
numpy.random.seed(1)
for i in range(0,nofSequences):
startPos = numpy.random.random()*50
for j in range(0,maxLength):
if j>=startPos and j<startPos+10:
x_np[i,j,0] = math.sin((j-startPos)*math.pi/10)
else:
x_np[i,j,0] = 0.0
if j<startPos+10:
y_np[i,j] = 1
else:
y_np[i,j] = 0


# Define the neural network
INPUT_SIZE = 1
class RNN(torch.nn.Module):
def __init__(self):
super(RNN, self).__init__()

self.rnn = torch.nn.RNN(
input_size=INPUT_SIZE,
hidden_size=16, # rnn hidden unit
num_layers=1, # number of rnn layer
batch_first=True,
)
self.out = torch.nn.Linear(16, 2)

def forward(self, x, h_state):
r_out, h_state = self.rnn(x, h_state)

outs = [] # save all predictions
for time_step in range(r_out.size(1)): # calculate output for each time step
outs.append(self.out(r_out[:, time_step, :]))
return torch.stack(outs, dim=1), h_state

# Learn the network
rnn = RNN()
optimizer = torch.optim.Adam(rnn.parameters(), lr=0.01)
h_state = None # for initial hidden state

x = torch.Tensor(x_np) # shape (batch, time_step, input_size)
y = torch.Tensor(y_np).long()

torch.manual_seed(2)
numpy.random.seed(2)

for step in range(100):

prediction, h_state = rnn(x, h_state) # rnn output

# !! next step is important !!
h_state = h_state.data # repack the hidden state, break the connection from last iteration

loss = torch.nn.CrossEntropyLoss()(prediction.reshape((-1,2)),torch.autograd.Variable(y.reshape((-1,)))) # calculate loss
optimizer.zero_grad() # clear gradients for this training step
loss.backward() # backpropagation, compute gradients
optimizer.step() # apply gradients

errTrain = (prediction.max(2)[1].data != y).float().mean()
print("Error Training:",errTrain.item())

对于那些想要重现实验的人,使用以下代码(使用 Jupyter Notebook)绘制绘图:

steps = range(0,maxLength)
plotChoice = 3

plt.figure(1, figsize=(12, 5))
plt.ion() # continuously plot

plt.plot(steps, y_np[plotChoice,:].flatten(), 'r-')
plt.plot(steps, numpy.argmax(prediction.detach().numpy()[plotChoice,:,:],axis=1), 'g-')
plt.plot(steps, x_np[plotChoice,:,0].flatten(), 'b-')

plt.ioff()
plt.show()

最佳答案

来自 tourch.nn.RNN 的文档,RNN实际上是一个Elman网络,并且有以下属性可见here .Elman网络的输出只依赖于隐藏状态,而隐藏状态依赖于最后一个输入和之前的隐藏状态。

由于我们设置了“h_state = h_state.data”,我们实际上是使用最后一个序列的隐藏状态来预测新序列的第一个状态,这将导致输出严重依赖于前一个序列的最后一个输出序列(为 0)。如果我们处于序列的开头或结尾,Elman 网络无法分离,它只能“看到”状态和最后的输入。

为了解决这个问题,我们可以设置“h_state = None”。现在每个新序列都以空状态开始。这导致以下预测(其中绿线再次显示预测)。 enter image description here现在我们从 1 开始,但在脉冲再次将其推回之前迅速下降到 0。Elman 网络可以解释一些时间依赖性,但它不擅长记住长期依赖性,并且不善于为该输入收敛到“最常见的输出”。

因此,为了解决这个问题,我建议使用以处理长期依赖关系而闻名的网络,即长短期内存 (LSTM) rnn,有关更多信息,请参阅 torch.nn.LSTM .保留“h_state = None”并将 torch.nn.RNN 更改为 torch.nn.LSTM。

完整的代码和情节见下文

import torch
import numpy, math
import matplotlib.pyplot as plt

nofSequences = 5
maxLength = 130

# Generate training data
x_np = numpy.zeros((nofSequences,maxLength,1))
y_np = numpy.zeros((nofSequences,maxLength))
numpy.random.seed(1)
for i in range(0,nofSequences):
startPos = numpy.random.random()*50
for j in range(0,maxLength):
if j>=startPos and j<startPos+10:
x_np[i,j,0] = math.sin((j-startPos)*math.pi/10)
else:
x_np[i,j,0] = 0.0
if j<startPos+10:
y_np[i,j] = 1
else:
y_np[i,j] = 0


# Define the neural network
INPUT_SIZE = 1
class RNN(torch.nn.Module):
def __init__(self):
super(RNN, self).__init__()

self.rnn = torch.nn.LSTM(
input_size=INPUT_SIZE,
hidden_size=16, # rnn hidden unit
num_layers=1, # number of rnn layer
batch_first=True,
)
self.out = torch.nn.Linear(16, 2)

def forward(self, x, h_state):
r_out, h_state = self.rnn(x, h_state)

outs = [] # save all predictions
for time_step in range(r_out.size(1)): # calculate output for each time step
outs.append(self.out(r_out[:, time_step, :]))
return torch.stack(outs, dim=1), h_state

# Learn the network
rnn = RNN()
optimizer = torch.optim.Adam(rnn.parameters(), lr=0.01)
h_state = None # for initial hidden state

x = torch.Tensor(x_np) # shape (batch, time_step, input_size)
y = torch.Tensor(y_np).long()

torch.manual_seed(2)
numpy.random.seed(2)

for step in range(100):

prediction, h_state = rnn(x, h_state) # rnn output

# !! next step is important !!
h_state = None

loss = torch.nn.CrossEntropyLoss()(prediction.reshape((-1,2)),torch.autograd.Variable(y.reshape((-1,)))) # calculate loss
optimizer.zero_grad() # clear gradients for this training step
loss.backward() # backpropagation, compute gradients
optimizer.step() # apply gradients

errTrain = (prediction.max(2)[1].data != y).float().mean()
print("Error Training:",errTrain.item())


###############################################################################
steps = range(0,maxLength)
plotChoice = 3

plt.figure(1, figsize=(12, 5))
plt.ion() # continuously plot

plt.plot(steps, y_np[plotChoice,:].flatten(), 'r-')
plt.plot(steps, numpy.argmax(prediction.detach().numpy()[plotChoice,:,:],axis=1), 'g-')
plt.plot(steps, x_np[plotChoice,:,0].flatten(), 'b-')

plt.ioff()
plt.show()

enter image description here

关于python - 循环网络 (RNN) 不会学习非常简单的函数(问题中显示的图),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52857213/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com