gpt4 book ai didi

python - ND 卷积反向传播

转载 作者:行者123 更新时间:2023-12-03 15:51:30 25 4
gpt4 key购买 nike

对于我的教育,我试图在卷积神经网络中实现一个 N 维卷积层。

我想实现一个反向传播功能。但是,我不确定这样做的最有效方法。

目前,我正在使用signal.fftconvolve到:

  • 在 forwards 步骤中,卷积过滤器和内核对所有过滤器进行转发;
  • 在反向传播步骤中,将导数(使用 FlipAllAxes 函数在所有维度上反转)与数组 ( https://jefkine.com/general/2016/09/05/backpropagation-in-convolutional-neural-networks/ ) 对所有滤波器进行卷积,并将它们求和。我认为输出是每个图像与每个过滤器的每个导数卷积的总和。

  • 我特别困惑如何 对导数进行卷积 .使用下面的类进行反向传播会导致权重的大小爆炸。

    用输出和滤波器对导数的卷积进行编程的正确方法是什么?

    编辑:

    根据这篇论文( Fast Training of Convolutional Networks through FFTs ),它试图做我想做的事情:
  • 前一层的导数由当前层的导数与权重的卷积给出:

    dL/dy_f = dL/dx * w_f^T
  • 权重的导数是导数与原始输入的卷积的分段和:

    dL/dy = dL/dx * x

  • 尽我所知,我已经在下面实现了这一点。然而,这似乎并没有给出预期的结果,因为我使用这一层编写的网络在训练期间表现出剧烈的波动。
        import numpy as np
    from scipy import signal

    class ConvNDLayer:
    def __init__(self,channels, kernel_size, dim):

    self.channels = channels
    self.kernel_size = kernel_size;
    self.dim = dim

    self.last_input = None

    self.filt_dims = np.ones(dim+1).astype(int)
    self.filt_dims[1:] = self.filt_dims[1:]*kernel_size
    self.filt_dims[0]= self.filt_dims[0]*channels
    self.filters = np.random.randn(*self.filt_dims)/(kernel_size)**dim


    def FlipAllAxes(self, array):

    sl = slice(None,None,-1)
    return array[tuple([sl]*array.ndim)]

    def ViewAsWindows(self, array, window_shape, step=1):
    # -- basic checks on arguments
    if not isinstance(array, cp.ndarray):
    raise TypeError("`array` must be a Cupy ndarray")
    ndim = array.ndim
    if isinstance(window_shape, numbers.Number):
    window_shape = (window_shape,) * ndim
    if not (len(window_shape) == ndim):
    raise ValueError("`window_shape` is incompatible with `arr_in.shape`")

    if isinstance(step, numbers.Number):
    if step < 1:
    raise ValueError("`step` must be >= 1")
    step = (step,) * ndim
    if len(step) != ndim:
    raise ValueError("`step` is incompatible with `arr_in.shape`")

    arr_shape = array.shape
    window_shape = np.asarray(window_shape, dtype=arr_shape.dtype))

    if ((arr_shape - window_shape) < 0).any():
    raise ValueError("`window_shape` is too large")

    if ((window_shape - 1) < 0).any():
    raise ValueError("`window_shape` is too small")

    # -- build rolling window view
    slices = tuple(slice(None, None, st) for st in step)
    window_strides = array.strides
    indexing_strides = array[slices].strides
    win_indices_shape = (((array.shape -window_shape)
    // step) + 1)

    new_shape = tuple(list(win_indices_shape) + list(window_shape))
    strides = tuple(list(indexing_strides) + list(window_strides))

    arr_out = as_strided(array, shape=new_shape, strides=strides)

    return arr_out

    def UnrollAxis(self, array, axis):
    # This so it works with a single dimension or a sequence of them
    axis = cp.asnumpy(cp.atleast_1d(axis))
    axis2 = cp.asnumpy(range(len(axis)))

    # Put unrolled axes at the beginning
    array = cp.moveaxis(array, axis,axis2)
    # Unroll
    return array.reshape((-1,) + array.shape[len(axis):])

    def Forward(self, array):

    output_shape =cp.zeros(array.ndim + 1)
    output_shape[1:] = cp.asarray(array.shape)
    output_shape[0]= self.channels
    output_shape = output_shape.astype(int)
    output = cp.zeros(cp.asnumpy(output_shape))

    self.last_input = array

    for i, kernel in enumerate(self.filters):
    conv = self.Convolve(array, kernel)
    output[i] = conv

    return output


    def Backprop(self, d_L_d_out, learn_rate):

    d_A= cp.zeros_like(self.last_input)
    d_W = cp.zeros_like(self.filters)


    for i, (kernel, d_L_d_out_f) in enumerate(zip(self.filters, d_L_d_out)):

    d_A += signal.fftconvolve(d_L_d_out_f, kernel.T, "same")
    conv = signal.fftconvolve(d_L_d_out_f, self.last_input, "same")
    conv = self.ViewAsWindows(conv, kernel.shape)
    axes = np.arange(kernel.ndim)
    conv = self.UnrollAxis(conv, axes)
    d_W[i] = np.sum(conv, axis=0)


    output = d_A*learn_rate
    self.filters = self.filters - d_W*learn_rate
    return output

    最佳答案

    用learn_rate 乘以梯度通常是不够的。

    为了获得更好的性能并减少剧烈波动,梯度使用优化器通过除以过去几个梯度(RMSprop)等方法进行缩放。

    更新还取决于错误,如果您单独为每个样本传递错误,通常会产生噪音,因此最好对多个样本(小批量)进行平均。

    关于python - ND 卷积反向传播,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60182609/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com