python - 为双向 GRU 适配 Pytorch "NLP from Scratch"-6ren

python - 为双向 GRU 适配 Pytorch "NLP from Scratch"

转载作者：行者123 更新时间：2023-12-04 01:27:31

我已从教程中获取代码并尝试对其进行修改以包括双向性和 GRU 的任意数量的层。

链接到使用单向、单层 GRU 的教程:
https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

该模型工作正常，但是当我使用 set bidirectional=True 时，我得到一个尺寸不匹配错误(如下所示)。任何想法为什么会这样？

编码器 :

import torch.nn.init as init
class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size, n_layers=1, bidirectional=False):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.hidden_var = hidden_size//2 if bidirectional else hidden_size
        self.n_layers = n_layers
        self.bidirectional = bidirectional
        self.n_directions = 2 if bidirectional else 1

        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size,
                          self.hidden_var, 
                          num_layers=self.n_layers,
                          bidirectional=self.bidirectional)

    def forward(self, input, hidden):
        embedded = self.embedding(input).view(1, 1, -1)
        output = embedded
        output, hidden = self.gru(output, hidden)
        #output = (output[:, :, :self.hidden_size] +
        #        output[:, :, self.hidden_size:])
        return output, hidden

    def initHidden(self):
        return torch.zeros(self.n_layers*self.n_directions, 1, self.hidden_var, device=device)

AttnDecoder:

class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, n_layers=1, dropout_p=0.1, max_length=MAX_LENGTH):
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p
        self.max_length = max_length
        self.n_layers = n_layers

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
        self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
        self.dropout = nn.Dropout(self.dropout_p)

        self.gru = nn.GRU(self.hidden_size,
                          self.hidden_size,
                          num_layers = self.n_layers)

        self.out = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input, hidden, encoder_outputs):
        embedded = self.embedding(input).view(1, 1, -1)
        embedded = self.dropout(embedded)

        attn_weights = F.softmax(
            self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)
        attn_applied = torch.bmm(attn_weights.unsqueeze(0),
                                 encoder_outputs.unsqueeze(0))

        output = torch.cat((embedded[0], attn_applied[0]), 1)
        output = self.attn_combine(output).unsqueeze(0)

        output = F.relu(output)
        output, hidden = self.gru(output, hidden)

        output = F.log_softmax(self.out(output[0]), dim=1)

        return output, hidden, attn_weights

    def initHidden(self):
        return torch.zeros(1*self.n_layers, 1, self.hidden_size, device=device)

除了这个代码块(考虑到新参数)之外，本教程中的所有其他内容都保持完全相同:

n_layers=1
bidirectional = True
hidden_size = 256
encoder1 = EncoderRNN(input_lang.n_words, hidden_size, n_layers=n_layers, bidirectional=bidirectional).to(device)
attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1, n_layers=n_layers).to(device)
trainIters(encoder1, attn_decoder1, 75000, print_every=5000)

错误:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-133-37084c93a197> in <module>
      5 attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1, n_layers=n_layers).to(device)
      6 
----> 7 trainIters(encoder1, attn_decoder1, 75000, print_every=5000)

<ipython-input-131-774ce8edefa6> in trainIters(encoder, decoder, n_iters, print_every, plot_every, learning_rate)
     16 
     17         loss = train(input_tensor, target_tensor, encoder,
---> 18                      decoder, encoder_optimizer, decoder_optimizer, criterion)
     19         print_loss_total += loss
     20         plot_loss_total += loss

<ipython-input-130-67be7e8c2a58> in train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, max_length)
     39         for di in range(target_length):
     40             decoder_output, decoder_hidden, decoder_attention = decoder(
---> 41                 decoder_input, decoder_hidden, encoder_outputs)
     42             topv, topi = decoder_output.topk(1)
     43             decoder_input = topi.squeeze().detach()  # detach from history as input

~/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

<ipython-input-129-6dd1d30fe28f> in forward(self, input, hidden, encoder_outputs)
     24 
     25         attn_weights = F.softmax(
---> 26             self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)
     27         attn_applied = torch.bmm(attn_weights.unsqueeze(0),
     28                                  encoder_outputs.unsqueeze(0))

~/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    545             result = self._slow_forward(*input, **kwargs)
    546         else:
--> 547             result = self.forward(*input, **kwargs)
    548         for hook in self._forward_hooks.values():
    549             hook_result = hook(self, input, result)

~/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/linear.py in forward(self, input)
     85 
     86     def forward(self, input):
---> 87         return F.linear(input, self.weight, self.bias)
     88 
     89     def extra_repr(self):

~/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1367     if input.dim() == 2 and bias is not None:
   1368         # fused op is marginally faster
-> 1369         ret = torch.addmm(bias, input, weight.t())
   1370     else:
   1371         output = input.matmul(weight.t())

RuntimeError: size mismatch, m1: [1 x 384], m2: [512 x 10] at /tmp/pip-req-build-58y_cjjl/aten/src/TH/generic/THTensorMath.cpp:752

任何帮助，将不胜感激!

基于 user3923920 评论的更新(编码器-解码器还包括 LSTM 选项，现在可用于双向)

新的工作和适应编码器

class EncoderRNN(nn.Module):
        def __init__(self, input_size, hidden_size, n_layers=1, bidirectional=False, method='GRU'):
            super(EncoderRNN, self).__init__()
            self.hidden_size = hidden_size
            self.hidden_var = hidden_size // 2 if bidirectional else hidden_size
            self.n_layers = n_layers
            self.bidirectional = bidirectional
            self.n_directions = 2 if bidirectional else 1
            self.method = method

            self.embedding = nn.Embedding(input_size, hidden_size)
            if self.method == 'GRU':
                self.net = nn.GRU(hidden_size,
                                  self.hidden_var,
                                  num_layers=self.n_layers,
                                  bidirectional=self.bidirectional)
            elif self.method == 'LSTM':
                self.net = nn.LSTM(hidden_size,
                                   self.hidden_var,
                                   num_layers=self.n_layers,
                                   bidirectional=self.bidirectional)

        def forward(self, input, hidden):
            embedded = self.embedding(input).view(1, 1, -1)
            output = embedded
            output, hidden = self.net(output, hidden)
            # output = (output[:, :, :self.hidden_size] +
            #        output[:, :, self.hidden_size:])
            return output, hidden, embedded

        def initHidden(self):
            if self.method == 'GRU':
                return torch.zeros(self.n_layers * self.n_directions, 1, self.hidden_var, device=device)
            elif self.method == 'LSTM':
                h_state = torch.zeros(self.n_layers * self.n_directions, 1, self.hidden_var)
                c_state = torch.zeros(self.n_layers * self.n_directions, 1, self.hidden_var)
                hidden = (h_state, c_state)
                return hidden

新的工作和适应解码器

class AttnDecoderRNN(nn.Module):
        def __init__(self, hidden_size, output_size, n_layers=1, dropout_p=0.1,
                     max_length=MAX_LENGTH, method='GRU', bidirectional=False):

            super(AttnDecoderRNN, self).__init__()
            self.hidden_size = hidden_size
            self.output_size = output_size
            self.dropout_p = dropout_p
            self.max_length = max_length
            self.n_layers = n_layers
            self.method = method
            self.bidirectional = bidirectional

            self.embedding = nn.Embedding(self.output_size, self.hidden_size)
            self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
            self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
            self.dropout = nn.Dropout(self.dropout_p)

            if self.method == 'GRU':
                self.net = nn.GRU(self.hidden_size,
                                  self.hidden_size,
                                  num_layers=self.n_layers)
            elif self.method == 'LSTM':
                self.net = nn.LSTM(self.hidden_size,
                                   self.hidden_size,
                                   num_layers=self.n_layers)

            self.out = nn.Linear(self.hidden_size, self.output_size)

        def forward(self, input, hidden, encoder_outputs):

            # Embed
            embedded = self.embedding(input).view(1, 1, -1)
            embedded = self.dropout(embedded)
            self.hidden = hidden

            # Concatenate all of the layers
            hidden_h_rows = ()
            hidden_c_rows = ()

            if self.method == 'LSTM':
                # hidden is a tuple of h_state and c_state
                decoder_h, decoder_c = hidden
                print(decoder_h.shape)
                hidden_shape = decoder_h.shape[0]

                # h_state
                for x in range(0, hidden_shape):
                    hidden_h_rows += (decoder_h[x],)

                # c_state
                for x in range(0, hidden_shape):
                    hidden_c_rows += (decoder_c[x],)

            elif self.method == "GRU":

                # hidden is not a tuple (GRU)
                decoder_h = hidden
                hidden_shape = decoder_h.shape[0]

                # h_state
                for x in range(0, hidden_shape):
                    hidden_h_rows += (decoder_h[x],)

            if self.bidirectional:
                decoder_h_cat = torch.cat(hidden_h_rows, 1)
                # Make sure the h_dim size is compatible with num_layers with concatenation.
                decoder_h = decoder_h_cat.view((self.n_layers, 1, self.hidden_size))  # hidden_size=256

                if self.method == "LSTM":
                    decoder_c_cat = torch.cat(hidden_c_rows, 1)
                    decoder_c = decoder_c_cat.view((self.n_layers, 1, self.hidden_size))  # hidden_size=256
                    hidden_lstm = (decoder_h, decoder_c)

                elif self.method == "GRU":
                    hidden_gru = decoder_h

            # Attention Block
            attn_weights = F.softmax(
                self.attn(torch.cat((embedded[0], hidden_lstm[0][0] if self.method == "LSTM" else \
                    hidden_gru[0]), 1)), dim=1)
            attn_applied = torch.bmm(attn_weights.unsqueeze(0), encoder_outputs.unsqueeze(0))
            output = torch.cat((embedded[0], attn_applied[0]), 1)
            output = self.attn_combine(output).unsqueeze(0)

            output = F.relu(output)
            output, hidden = self.net(output,
                                      hidden_lstm if self.method == "LSTM" else hidden_gru)  # I am not sure about this!
            output = F.log_softmax(self.out(output[0]), dim=1)
            return output, hidden, attn_weights

        def initHidden(self):

            if self.method == 'GRU':
                return torch.zeros(self.n_layers * 1, 1, self.hidden_var, device=device)
            elif self.method == 'LSTM':
                h_state = torch.zeros(self.n_layers * 1, 1, self.hidden_var)
                c_state = torch.zeros(self.n_layers * 1, 1, self.hidden_var)
                hidden = (h_state, c_state)
                return hidden

最佳答案

所以我不确定这是否 100% 正确，因为我只是在学习如何编程 RNN，但我在几个额外的领域更改了我的代码。

对于一个你会注意到错误说 m1: [1x384]所以结果
torch.cat((embedded[0], hidden[0]), 1))
当通过 attn 权重层时，它不是以 512 结尾的维度，即预期的输入大小。这是因为 hidden 是一个形状为 [2, 1, 256] 的张量，而不是某个形状 [1, 1, 512] 之类的。由于您的尺寸与我的尺寸不完全匹配，我不确定有什么不同，所以在 train(...) 中它只是设置
decoder_hidden = encoder_hidden
我愿意
decoder_hidden = torch.cat((encoder_hidden[0], encoder_hidden[1]) , 1) decoder_hidden = decoder_hidden.view((1, 1, 512))
希望这在某种程度上有所帮助

关于python - 为双向 GRU 适配 Pytorch "NLP from Scratch"，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58996451/

文章推荐： sql - Oracle SQL 语句中多列的总和(按唯一 ID)

文章推荐： python-3.x - py4JJava 错误 - 使用 select 语句时出错

文章推荐： ASP.NET 4.0 DropDownList 在文本中带有单引号

mit-scratch - 如何在 Scratch 中创建云变量
互联网上到处都是他们创建云变量的示例/教程。但是当我创建一个变量(我使用了scratch 2和3)时，我得到了但我想得到的是: 我刚刚看了一个 youtube 教程，其中被告知要使云正常工作，您必须
mit-scratch - Scratch 1.4 的命令行参数
我正在使用 Scratch 1.4 为 child 准备类(class)。本类(class)是关于控制真实设备(自制交通信号灯、 retrofit 有电机、传感器等的玩具) 为了连接硬件，我使用远程
mit-scratch - MIT Scratch 功能 block
我正在和我的 child 一起阅读一本“用 Scratch 学习编程”的书。其中一项练习是要求创建一个使用一些简单公式的“功能块”。他们没有在书中解释什么是“功能块”，否则我可能会错过。我也找不到任何
mit-scratch - Scratch 编程 block 背后的架构是什么？
我需要构建 Scratch 中使用的编程 block 的迷你版或稍后!或开放 block 。它们中的代码都很大而且很难理解，尤其是在 Scratch 中，它是用 SmallTalk 的某种子集编写的
mit-scratch - Scratch 输出文件 .txt 或类似文件
我想知道是否有一种简单的方法可以打开一个 .txt 文件并将一些逗号分隔的数据加载到 Scratch 中的变量中，然后将一些变量数据从 Scratch 添加到一个 .txt 文件或类似文件中？我已经
mit-scratch - 在 Scratch 中，如何将字符串拆分为字符列表？
我儿子对 ROT-13 密码感兴趣。我想帮助他在 MIT Scratch 中编写一个程序，该程序可以将字符串作为输入并返回 ROT-13 编码的文本作为输出。为此，程序需要取出字符串，分离出所有字符，
mit-scratch - 在 Scratch 中模拟自定义报告 block ？
在 Scratch 2.0 中，添加了对自定义堆栈 block (procedures) 的支持。但是有什么方法可以使用它来“抽象掉”返回值的逻辑吗？例如，我这里有一个简单计算指数的脚本:( vie
mit-scratch - Scratch 中的 ID : Cloud Variables
我有一个 multiplayer project它有一些永远的循环，其中包含检查代码。问题是，由于变量 dvotes、uvotes 滞后，多台计算机可能会处理此问题并更改 crabx 或 craby
mit-scratch - 是否可以在 Scratch 中运行时停止功能(自定义 block )？
我可以阻止某个脚本中的代码在另一个脚本中运行吗？我知道有一个停止 block ，但您只能停止当前脚本、所有其他脚本或所有脚本。最佳答案这里有一个解决方法:创建另一个隐藏的 Sprite ，其中只有
mit-scratch - 如何在 Scratch 中确定另一个 Sprite 的方向
我正在为 MIT Scratch 中的教育编写一个简单的游戏，并想让一个 Sprite 转向另一个 Sprite (想想我们的英雄飞船后面的一艘外星飞船)。我可以轻松地让外星飞船指向英雄: point
mit-scratch - 在 Scratch 上进行泊松盘采样的 Bridson 算法
我尝试从头开始编写Bridson 的泊松圆盘采样算法，它似乎是一些副作用或我找不到的错误。你能帮帮我吗？这是我的尝试: My try online. Some explanation on the
mit-scratch - 即使触摸另一个 Sprite ， Sprite 也会在 Scratch 中消失
在我的小狗沙龙项目中，我在到达项链部分时遇到了问题。我希望所有未使用的项链在收到消息 m11 时消失，但保留小狗身上的项链。然而，现在所有的项链都不见了。这是将项链分配给小狗时运行的代码: 这会将它
mit-scratch - NXT 积木真的可以用 Scratch 的 Enchanting 和改编来编程吗？
我希望我的学生使用 Scratch 的衍生产品 Enchanting 对 Mindstorm NXT 机器人进行编程，以驱动预先编程的类(class)，沿着路线行驶并避开障碍物。 (二态、五态和比例线
java - Mac 上的 IntelliJ 表示该文件应命名为 Scratch.java 但它已经命名为 Scratch.java
Error Image 我的一位学生正在 Mac 上使用 IntelliJ 中的处理。我们使用与我在 PC 上使用的相同步骤进行设置(适用于我的 PC)，但现在它给我一个错误，指出文件名应该是 Scr
盘点儿童智力开发的首选编程语言—Scratch
大家好，我是IT共享者，人称皮皮。前言 Scratch作为少儿编程的首选编程语言，这几年发展的如火如荼，当然，这主要还是因为它简单易学，不用掌握太多概念即可编程，这意味着你不用认识英文单词
mit-scratch - 如何阻止克隆继承消息
我有这个代码: 但是每当 shoot 消息被发送到 bullet Sprite ，并且舞台上有一个克隆人还没有击中它的目标时，克隆人就会与 go to x: (xOfTower) y: (yOfTow
mit-scratch - 为什么随机选择不从头开始选择一些数字？
我正在从头开始创建一个迷你家庭游戏，我使用随机选择块在 1 到 27 之间进行选择。我有 27 个背景，上面写着 1 到 27 个数字。但是我注意到在大约 21-22 个数字之后，随机选择器无法选择任
mit-scratch - 永远循环从头开始迭代的间隔是多长时间？
我注意到永远(或重复())循环在迭代之间需要时间(没有“等待()秒”块)。这究竟是多久？最佳答案使用此代码进行测试: 每次迭代平均为 0.000000994 秒，因此在处理时间之外似乎没有故意延迟
mit-scratch - 克隆网格
我的目标是构建 5x5 的图像网格。在以下代码中，row、col 和 rowcol 被创建为 Sprite 的本地变量，以及 newcol， newrow 和 cats 是全局的。 (顺便问一下，是否
mit-scratch - Scratch游戏计分错误
我正在 Scratch 上制作一个桨球游戏(只是为了好玩)，但我的计分遇到了问题。如果你想看我已经写的代码，游戏链接是https://scratch.mit.edu/projects/66541388

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 为双向 GRU 适配 Pytorch "NLP from Scratch"