python - pure-python RNN 和 theano RNN 计算不同的梯度—

python - pure-python RNN 和 theano RNN 计算不同的梯度——提供的代码和结果

转载作者：太空狗更新时间：2023-10-30 01:38:03

27

4

一段时间以来，我一直在苦思冥想，无法弄清楚我在实现这些 RNN 时做错了什么(如果有的话)。为了让你们省去前向阶段，我可以告诉你们这两个实现计算相同的输出，所以前向阶段是正确的。问题出在倒退阶段。

这是我的 python 反向代码。它非常接近但不完全遵循 karpathy 的 neuraltalk 风格:

def backward(self, cache, target,c=leastsquares_cost, dc=leastsquares_dcost):
        '''
        cache is from forward pass

        c is a cost function
        dc is a function used as dc(output, target) which gives the gradient dc/doutput 
        '''
        XdotW = cache['XdotW'] #num_time_steps x hidden_size
        Hin = cache['Hin'] # num_time_steps x hidden_size
        T = Hin.shape[0]
        Hout = cache['Hout']
        Xin = cache['Xin']
        Xout = cache['Xout']

        Oin = cache['Oin'] # num_time_steps x output_size
        Oout=cache['Oout']

        dcdOin = dc(Oout, target) # this will be num_time_steps x num_outputs. these are dc/dO_j


        dcdWho = np.dot(Hout.transpose(), dcdOin) # this is the sum of outer products for all time

        # bias term is added at the end with coefficient 1 hence the dot product is just the sum
        dcdbho = np.sum(dcdOin, axis=0, keepdims=True) #this sums all the time steps

        dcdHout = np.dot(dcdOin, self.Who.transpose()) #reflects dcdHout_ij should be the dot product of dcdoin and the i'th row of Who; this is only for the outputs
        # now go back in time
        dcdHin = np.zeros(dcdHout.shape)
        # for t=T we can ignore the other term (error from the next timestep). self.df is derivative of activation function (here, tanh):
        dcdHin[T-1] = self.df(Hin[T-1]) * dcdHout[T-1] # because we don't need to worry about the next timestep, dcdHout is already corrent for t=T

        for t in reversed(xrange(T-1)):
            # we need to add to dcdHout[t] the error from the next timestep
            dcdHout[t] += np.dot(dcdHin[t], self.Whh.transpose())
            # now we have the correct form for dcdHout[t]
            dcdHin[t] = self.df(Hin[t]) * dcdHout[t]
        # now we've gone through all t, and we can continue
        dcdWhh = np.zeros(self.Whh.shape)
        for t in range(T-1): #skip T bc dHdin[T+1] doesn't exist
            dcdWhh += np.outer(Hout[t], dcdHin[t+1])
        # and we can do bias as well
        dcdbhh = np.sum(dcdHin,axis=0, keepdims=True)


        # now we need to go back to the embeddings
        dcdWxh = np.dot(Xout.transpose(), dcdHin)

        return {'dcdOout': dcdOout, 'dcdWxh': dcdWxh, 'dcdWhh': dcdWhh, 'dcdWho': dcdWho, 'dcdbhh': dcdbhh, 'dcdbho': dcdbho, 'cost':c(Oout, target)}

这里是 theano 代码(主要是从我在网上找到的另一个实现复制的。我将权重初始化为我的纯 python rnn 的随机权重，以便一切都相同。):

# input (where first dimension is time)
u = TT.matrix()
# target (where first dimension is time)
t = TT.matrix()
# initial hidden state of the RNN
h0 = TT.vector()
# learning rate
lr = TT.scalar()
# recurrent weights as a shared variable
W = theano.shared(rnn.Whh)
# input to hidden layer weights
W_in = theano.shared(rnn.Wxh)
# hidden to output layer weights
W_out = theano.shared(rnn.Who)

# bias 1
b_h = theano.shared(rnn.bhh[0])
# bias 2
b_o = theano.shared(rnn.bho[0])


# recurrent function (using tanh activation function) and linear output
# activation function
def step(u_t, h_tm1, W, W_in, W_out):
    h_t = TT.tanh(TT.dot(u_t, W_in) + TT.dot(h_tm1, W) + b_h)
    y_t = TT.dot(h_t, W_out) + b_o
    return h_t, y_t

# the hidden state `h` for the entire sequence, and the output for the
# entrie sequence `y` (first dimension is always time)
[h, y], _ = theano.scan(step,
                        sequences=u,
                        outputs_info=[h0, None],
                        non_sequences=[W, W_in, W_out])
# error between output and target
error = (.5*(y - t) ** 2).sum()
# gradients on the weights using BPTT
gW, gW_in, gW_out, gb_h, gb_o = TT.grad(error, [W, W_in, W_out, b_h, b_o])
# training function, that computes the error and updates the weights using
# SGD.

现在这是疯狂的事情。如果我运行以下命令:

fn = theano.function([h0, u, t, lr],
                     [error, y, h, gW, gW_in, gW_out, gb_h, gb_o],
                     updates={W: W - lr * gW,
                             W_in: W_in - lr * gW_in,
                             W_out: W_out - lr * gW_out})

er, yout, hout, gWhh, gWhx, gWho, gbh, gbo =fn(numpy.zeros((n,)), numpy.eye(5), numpy.eye(5),.01)
cache = rnn.forward(np.eye(5))
bc = rnn.backward(cache, np.eye(5))

print "sum difference between gWho (theano) and bc['dcdWho'] (pure python):"
print np.sum(gWho - bc['dcdWho'])
print "sum differnce between gWhh(theano) and bc['dcdWho'] (pure python):"
print np.sum(gWhh - bc['dcdWhh'])
print "sum difference between gWhx (theano) and bc['dcdWxh'] (pure pyython):"
print np.sum(gWhx - bc['dcdWxh'])

print "sum different between the last row of gWhx (theano) and the last row of bc['dcdWxh'] (pure python):"
print np.sum(gWhx[-1] - bc['dcdWxh'][-1])

我得到以下输出:

sum difference between gWho (theano) and bc['dcdWho'] (pure python):
-4.59268040265e-16
sum differnce between gWhh(theano) and bc['dcdWhh'] (pure python):
0.120527063611
sum difference between gWhx (theano) and bc['dcdWxh'] (pure pyython):
-0.332613468652
sum different between the last row of gWhx (theano) and the last row of bc['dcdWxh'] (pure python):
4.33680868994e-18

因此，我得到了隐藏层和输出权之间的权重矩阵的导数，但不是隐藏层 -> 隐藏或输入 -> 隐藏的权重矩阵的导数。但这个疯狂的事情是我总是得到权重矩阵输入的最后一行 ->隐藏正确。这对我来说太疯狂了。我不知道这里发生了什么。请注意，权重矩阵 input -> hidden 的最后一行不对应于最后一个时间步或任何东西(例如，这可以通过我为最后一个时间步正确计算导数但没有正确地传播回时间来解释)。 dcdWxh 是 dcdWxh 所有时间步长的总和——那么我怎样才能得到这一行正确而其他行都没有呢？？？

有人可以帮忙吗？我在这里完全没有想法。

最佳答案

您应该计算两个矩阵之差的逐点绝对值之和。由于特定的学习任务(您是否模拟零函数？:)，无论是哪个，普通总和可能接近于零。

最后一行大概实现了来自常量神经元的权重，即偏差，因此您 - 似乎 - 总是获得正确的偏差(但是，检查绝对值的总和)。

它看起来也像矩阵的行优先和列优先符号混淆，就像在

gWhx - bc['dcdWxh']

它读起来像权重从“隐藏到 x”与“x 到隐藏”相反。

我宁愿将此作为评论发布，但我缺乏这样做的声誉。对不起!

关于python - pure-python RNN 和 theano RNN 计算不同的梯度——提供的代码和结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27544698/

27

4

0

文章推荐： python - 如何使用 FaceBook Python SDK 上传多张图片？

文章推荐： sql-server - 你如何对你的 T-SQL 进行单元测试

文章推荐： database - 为日历应用程序布置数据库模式

文章推荐：上下文中的python十进制量化与prec

服务器端的 Firebird 计算(计算)字段
SQL 和一般开发的新手，我有一个表(COUNTRIES)，其中包含字段(INDEX、NAME、POPULATION、AREA) 通常我添加一个客户端(Delphi)计算字段(DENSITY)和 On
jquery - 计算(百分比)计算(像素)
我想使用 calc(100%-100px)，但在我的 demo 中不起作用由于高度只接受像素，因此如何将此百分比值转换为像素。最佳答案以下将为您提供高度: $(window).height();
MySql 计算
我正在尝试在 MySQL 中添加列并动态填充其他列。例如我有一张表“数字”并具有第 1 列、第 2 列、第 3 列，这些总数应填充在第 4 列中最佳答案除非我误解了你的问题，否则你不只是在寻找:
mysql - 计算
我想返回简单计算的结果，但我不确定如何执行此操作。我的表格如下: SELECT COUNT(fb.engineer_id) AS `total_feedback`, SUM(fb.ra
嵌套for循环中的c++计算
我一直在尝试做这个程序，但我被卡住了，我仍然是一个初学者，任何帮助将不胜感激。我需要程序来做打印一个 10 X 10 的表格，其中表格中的每个条目都是行号和列号的总和包含一个累加器，用于计算所有表
c - 计算
这个计算背后一定有一些逻辑。但我无法得到它。普通数学不会导致这种行为。谁能帮我解释一下原因 printf ("float %f\n", 2/7 * 100.0); 结果打印 1.000000 为什么会
计算 AND 的算法
我想计算从 0 到 (n)^{1/2} - 1 的数字的 AND每个数字从 0 到 (n)^{1/2} - 1 .我想在 O(n) 中执行此操作时间，不能使用 XOR、OR、AND 运算。具体来说，
Excel - 在数字格式中使用公式/计算
如何在 Excel 中将公式放入自定义数字格式？例如(出于说明目的随机示例)，假设我有以下数据: 输入输出在不编辑单元格中的实际数据的情况下，我想显示单元格中的值除以 2，并保留两位小数: 有没
Flutter:隔离内存泄漏(计算)
每次我在 Flutter 应用程序中调用计算()时，我都会看到内存泄漏，据我所知，这基本上只是一种生成隔离的便捷方法。我的应用程序内存占用增加并且在 GC 之后永远不会减少。我已将我的代码简化为仅调
R中的RMSE(均方根偏差)计算
我有数字特征观察 V1通过 V12用于目标变量 Wavelength .我想计算 Vx 之间的 RMSE列。数据格式如下。每个变量“Vx”以 5 分钟的间隔进行测量。我想计算所有 Vx 变量的观测值
计算 C 文件中未知数量的字符
我正在寻找一种使用 C 语言计算文件中未知字符数的简单方法。谢谢你的帮助最佳答案 POSIX 方式(可能是您想要的方式): off_t get_file_length( FILE *file ) {
sql - 计算/派生连续日期跨度中的第一个开始日期
我正在使用 Postgres，并且我正试图围绕如何在连续日期跨度中得出第一个开始日期的问题进行思考。例如 :- ID | Start Date | End Date =================
jquery - 计算，用逗号替换点
我有一个订单表格，我在其中使用 jQuery 计算插件来汇总总数。此求和工作正常，但生成的“总和”存在问题。总之，我希望用逗号替换任何点。代码的基础是； function ($this) {
Delphi错误的 double 计算
我在使用 double 变量计算简单算术方程时遇到问题。我有一个具有 double 属性 Value 的组件，我将此属性设置为 100。然后我做一个简单的减法来检查这个值是否真的是 100: va
openssl CRC32 计算
我在这里看到了一些关于 CRC 32 计算的其他问题。但没有一个让我满意，因此是这样。 openssl 库是否有任何用于计算 CRC32 的 api 支持？我已经在为 SHA1 使用 openssl，
php - 计算-1个月时的PHP天错误
当我在PHP日期计算中遇到问题时，我感到惊讶。 $add = '- 30 days'; echo date('Y-m-01', strtotime($add)); // result is 2017-
持有变量的 JavaScript 计算
我正在使用 javascript 进行练习，我编写了这个脚本来计算 2 个变量的总和，然后在第三个方程中使用这个总和!关于如何完成这项工作的任何想法都将非常有用! First Number:
audio - sample 计算
我有一个来自EAC的提示单和一个包含完整专辑的FLAC文件。我正在尝试制作一些python脚本来播放文件，因为我需要能够设置在flac文件中开始的位置。如何从CueSheet格式MM:SS:FF转
javascript - 计算 for 循环中输入值的总和
这个问题已经有答案了: Adding two numbers concatenates them instead of calculating the sum (24 个回答) 已关闭去年。我有一个
使用输入和跨度字段的 Javascript 计算
4000 我需要上面字段 name="quantity" 和 id="price" 中的值，并使用 javascript 函数进行计算，并将其显示在字段 id= 中仅当我单击计算按钮时才显示“总

首页

博学

6Ren·AI

商城

python - pure-python RNN 和 theano RNN 计算不同的梯度——提供的代码和结果