gpt4 book ai didi

python - MLP 神经网络 : calculating the gradient (matrices)

转载 作者:太空宇宙 更新时间:2023-11-04 03:58:30 26 4
gpt4 key购买 nike

在 n 层神经网络中计算梯度的最佳实现是什么?

权重层:

  1. 第一层权重: (n_inputs+1, n_units_layer)-matrix
  2. 隐藏层权重:(n_units_layer+1, n_units_layer)-matrix
  3. 最后一层权重: (n_units_layer+1, n_outputs)-matrix

注意事项:

  • 如果只有一个隐藏层,我们将只使用两个(权重)层来表示网络:
    输入 --first_layer-> network_unit --second_layer-> 输出
  • 对于具有多个隐藏层的n层网络,我们需要实现步骤(2)。

有点模糊的伪代码:

    weight_layers = [ layer1, layer2 ]             # a list of layers as described above
input_values = [ [0,0], [0,0], [1,0], [0,1] ] # our test set (corresponds to XOR)
target_output = [ 0, 0, 1, 1 ] # what we want to train our net to output
output_layers = [] # output for the corresponding layers

for layer in weight_layers:
output <-- calculate the output # calculate the output from the current layer
output_layers <-- output # store the output from each layer

n_samples = input_values.shape[0]
n_outputs = target_output.shape[1]

error = ( output-target_output )/( n_samples*n_outputs )

""" calculate the gradient here """

最终实现

The final implementation is available at github .

最佳答案

使用 Python 和 numpy 很容易。

你有两个选择:

  1. 您可以为 num_instances 个实例并行计算所有内容,或者
  2. 您可以计算一个实例的梯度(这实际上是 1 的特例)。

现在我将给出一些如何实现选项 1 的提示。我建议您创建一个名为 Layer 的新类。它应该有两个功能:

forward:    inputs:    X: shape = [num_instances, num_inputs]        inputs    W: shape = [num_outputs, num_inputs]        weights    b: shape = [num_outputs]        biases    g: function        activation function    outputs:    Y: shape = [num_instances, num_outputs]        outputsbackprop:    inputs:    dE/dY: shape = [num_instances, num_outputs]        backpropagated gradient    W: shape = [num_outputs, num_inputs]        weights    b: shape = [num_outputs]        biases    gd: function        calculates the derivative of g(A) = Y        based on Y, i.e. gd(Y) = g'(A)    Y: shape = [num_instances, num_outputs]        outputs    X: shape = [num_instances, num_inputs]        inputs    outputs:    dE/dX: shape = [num_instances, num_inputs]        will be backpropagated (dE/dY of lower layer)    dE/dW: shape = [num_outputs, num_inputs]        accumulated derivative with respect to weights    dE/db: shape = [num_outputs]        accumulated derivative with respect to biases

The implementation is simple:

def forward(X, W, b):
A = X.dot(W.T) + b # will be broadcasted
Y = g(A)
return Y

def backprop(dEdY, W, b, gd, Y, X):
Deltas = gd(Y) * dEdY # element-wise multiplication
dEdX = Deltas.dot(W)
dEdW = Deltas.T.dot(X)
dEdb = Deltas.sum(axis=0)
return dEdX, dEdW, dEdb
第一层的

X 是从数据集中获取的,然后将每个 Y 作为下一层的 X 传递前传。

输出层的dE/dY被计算为Y-T,其中 Y 是网络的输出 (shape = [num_instances, num_outputs]),T (shape = [num_instances, num_outputs]) 是所需的输出。然后你可以反向传播,即每一层的dE/dX是上一层的dE/dY

现在可以使用每一层的dE/dWdE/db来更新Wb .

这是 C++ 的示例:OpenANN .

顺便说一句。你可以比较 instance-wise 和 batch-wise 前向传播的速度:

In [1]: import timeit

In [2]: setup = """import numpy
...: W = numpy.random.rand(10, 5000)
...: X = numpy.random.rand(1000, 5000)"""

In [3]: timeit.timeit('[W.dot(x) for x in X]', setup=setup, number=10)
Out[3]: 0.5420958995819092

In [4]: timeit.timeit('X.dot(W.T)', setup=setup, number=10)
Out[4]: 0.22001314163208008

关于python - MLP 神经网络 : calculating the gradient (matrices),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17049321/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com