gpt4 book ai didi

python - `for` 循环到PyTorch中的多维数组

转载 作者:太空宇宙 更新时间:2023-11-03 14:29:17 24 4
gpt4 key购买 nike

我想实现带有注意力机制的问答系统。我有两个输入; contextquery哪些形状是 (batch_size, context_seq_len, embd_size)(batch_size, query_seq_len, embd_size) .
我正在关注下面的论文。使用 Match-LSTM 和答案指针进行机器理解。 https://arxiv.org/abs/1608.07905

然后,我想获得一个形状为(batch_size, context_seq_len, query_seq_len, embd_size)的注意力矩阵。在论文中,他们计算每一行的值(这意味着论文中的每个上下文单词,G_i,alpha_i)。

我的代码如下并且正在运行。但我不确定我的方法好不好。例如,我使用 for loop用于生成序列数据( for i in range(T): )。为了获取每一行,我使用就地运算符,如 G[:,i,:,:] , embd_context[:,i,:].clone()是pytorch中的好方法吗?如果不是,我应该在哪里更改代码?

如果您注意到其他要点,请告诉我。我是这个领域和 pytorch 的新手。抱歉我的问题含糊不清。

class MatchLSTM(nn.Module):
def __init__(self, args):
super(MatchLSTM, self).__init__()
self.embd_size = args.embd_size
d = self.embd_size
self.answer_token_len = args.answer_token_len

self.embd = WordEmbedding(args)
self.ctx_rnn = nn.GRU(d, d, dropout = 0.2)
self.query_rnn = nn.GRU(d, d, dropout = 0.2)

self.ptr_net = PointerNetwork(d, d, self.answer_token_len) # TBD

self.w = nn.Parameter(torch.rand(1, d, 1).type(torch.FloatTensor), requires_grad=True) # (1, 1, d)
self.Wq = nn.Parameter(torch.rand(1, d, d).type(torch.FloatTensor), requires_grad=True) # (1, d, d)
self.Wp = nn.Parameter(torch.rand(1, d, d).type(torch.FloatTensor), requires_grad=True) # (1, d, d)
self.Wr = nn.Parameter(torch.rand(1, d, d).type(torch.FloatTensor), requires_grad=True) # (1, d, d)

self.match_lstm_cell = nn.LSTMCell(2*d, d)

def forward(self, context, query):
# params
d = self.embd_size
bs = context.size(0) # batch size
T = context.size(1) # context length
J = query.size(1) # query length

# LSTM Preprocessing Layer
shape = (bs, T, J, d)
embd_context = self.embd(context) # (N, T, d)
embd_context, _h = self.ctx_rnn(embd_context) # (N, T, d)
embd_context_ex = embd_context.unsqueeze(2).expand(shape).contiguous() # (N, T, J, d)
embd_query = self.embd(query) # (N, J, d)
embd_query, _h = self.query_rnn(embd_query) # (N, J, d)
embd_query_ex = embd_query.unsqueeze(1).expand(shape).contiguous() # (N, T, J, d)

# Match-LSTM layer
G = to_var(torch.zeros(bs, T, J, d)) # (N, T, J, d)

wh_q = torch.bmm(embd_query, self.Wq.expand(bs, d, d)) # (N, J, d) = (N, J, d)(N, d, d)

hidden = to_var(torch.randn([bs, d])) # (N, d)
cell_state = to_var(torch.randn([bs, d])) # (N, d)
# TODO bidirectional
H_r = [hidden]
for i in range(T):
wh_p_i = torch.bmm(embd_context[:,i,:].clone().unsqueeze(1), self.Wp.expand(bs, d, d)).squeeze() # (N, 1, d) -> (N, d)
wh_r_i = torch.bmm(hidden.unsqueeze(1), self.Wr.expand(bs, d, d)).squeeze() # (N, 1, d) -> (N, d)
sec_elm = (wh_p_i + wh_r_i).unsqueeze(1).expand(bs, J, d) # (N, J, d)

G[:,i,:,:] = F.tanh( (wh_q + sec_elm).view(-1, d) ).view(bs, J, d) # (N, J, d) # TODO bias

attn_i = torch.bmm(G[:,i,:,:].clone(), self.w.expand(bs, d, 1)).squeeze() # (N, J)
attn_query = torch.bmm(attn_i.unsqueeze(1), embd_query).squeeze() # (N, d)
z = torch.cat((embd_context[:,i,:], attn_query), 1) # (N, 2d)

hidden, cell_state = self.match_lstm_cell(z, (hidden, cell_state)) # (N, d), (N, d)
H_r.append(hidden)
H_r = torch.stack(H_r, dim=1) # (N, T, d)

indices = self.ptr_net(H_r) # (N, M, T) , M means (start, end)
return indices

最佳答案

我认为你的代码很好。你无法避免循环: for i in range(T): 因为在论文 ( https://openreview.net/pdf?id=B1-q5Pqxl ) 的方程 (2) 中,有一个来自 Match-LSTM 单元的隐藏状态参与计算 G_i 和 alpha_i 向量,它们用于计算 Match-LSTM 下一个时间步的输入。因此,您需要为 Match-LSTM 的每个时间步运行循环,无论如何我都没有看到避免 for 循环的替代方法。

关于python - `for` 循环到PyTorch中的多维数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47417159/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com