python - 使用有限差分方法检查神经网络梯度不起作用-6ren

python - 使用有限差分方法检查神经网络梯度不起作用

转载作者：行者123 更新时间：2023-12-03 08:27:06

经过整整一周的打印语句、维度分析、重构和大声朗读代码后，我可以说我完全陷入困境.

我的成本函数产生的梯度与有限差分产生的梯度相差太远。

我已经确认我的成本函数会为正则化输入产生正确的成本，但不会。这是成本函数:

def nnCost(nn_params, X, y, lambda_, input_layer_size, hidden_layer_size, num_labels):
  # reshape parameter/weight vectors to suit network size
  Theta1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)], (hidden_layer_size, (input_layer_size + 1)))
  Theta2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size+1)):], (num_labels, (hidden_layer_size + 1)))

  if lambda_ is None:
    lambda_ = 0

  # grab number of observations
  m = X.shape[0]
  
  # init variables we must return
  cost = 0
  Theta1_grad = np.zeros(Theta1.shape)
  Theta2_grad = np.zeros(Theta2.shape)

  # one-hot encode the vector y
  y_mtx = pd.get_dummies(y.ravel()).to_numpy() 

  ones = np.ones((m, 1))
  X = np.hstack((ones, X))
  
  # layer 1
  a1 = X
  z2 = <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="adf9c5c8d9cc9cedcc9c83f9" rel="noreferrer noopener nofollow">[email protected]</a>
  # layer 2
  ones_l2 = np.ones((y.shape[0], 1))
  a2 = np.hstack((ones_l2, sigmoid(z2.T)))
  z3 = <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d682beb3a2b7e496b7e4f882" rel="noreferrer noopener nofollow">[email protected]</a>
  # layer 3
  a3 = sigmoid(z3)

  reg_term = (lambda_/(2*m)) * (np.sum(np.sum(np.multiply(Theta1, Theta1))) + np.sum(np.sum(np.multiply(Theta2,Theta2))) - np.subtract((Theta1[:,0].T@Theta1[:,0]),(Theta2[:,0].T@Theta2[:,0])))
  cost = (1/m) * np.sum((-np.log(a3).T * (y_mtx) - np.log(1-a3).T * (1-y_mtx))) + reg_term
  
  # BACKPROPAGATION
  # δ3 equals the difference between a3 and the y_matrix
  d3 = a3 - y_mtx.T
  # δ2 equals the product of δ3 and Θ2 (ignoring the Θ2 bias units) multiplied element-wise by the g′() of z2 (computed back in Step 2).
  d2 = Theta2[:,1:].T@d3 * sigmoidGradient(z2)
  # Δ1 equals the product of δ2 and a1.
  Delta1 = d2@a1
  Delta1 /= m
  # Δ2 equals the product of δ3 and a2.
  Delta2 = d3@a2
  Delta2 /= m
  
  reg_term1 = (lambda_/m) * np.append(np.zeros((Theta1.shape[0],1)), Theta1[:,1:], axis=1)
  reg_term2 = (lambda_/m) * np.append(np.zeros((Theta2.shape[0],1)), Theta2[:,1:], axis=1)
  
  Theta1_grad = Delta1 + reg_term1
  Theta2_grad = Delta2 + reg_term2
  
  grad = np.append(Theta1_grad.ravel(), Theta2_grad.ravel())
  
  return cost, grad

这是检查渐变的代码。我已经检查过每一行，但我想不出任何可以改变的地方。看起来一切正常。

def checkNNGradients(lambda_):
  """
  Creates a small neural network to check the backpropagation gradients. 
  Credit: Based on the MATLAB code provided by Dr. Andrew Ng, Stanford Univ.
  
  Input: Regularization parameter, lambda, as int or float.
  
  Output: Analytical gradients produced by backprop code and the numerical gradients (computed
  using computeNumericalGradient). These two gradient computations should result in 
  very similar values. 
  """

  input_layer_size = 3
  hidden_layer_size = 5
  num_labels = 3
  m = 5

  # generate 'random' test data
  Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size)
  Theta2 = debugInitializeWeights(num_labels, hidden_layer_size)

  # reusing debugInitializeWeights to generate X
  X  = debugInitializeWeights(m, input_layer_size - 1)
  y  = np.ones(m) + np.remainder(np.range(m), num_labels)


  # unroll parameters
  nn_params = np.append(Theta1.ravel(), Theta2.ravel())
  costFunc = lambda p: nnCost(p, X, y, lambda_, input_layer_size, hidden_layer_size, num_labels)
    
  cost, grad = costFunc(nn_params)
    
  numgrad = computeNumericalGradient(costFunc, nn_params)

  # examine the two gradient computations; two columns should be very similar. 
  print('The columns below should be very similar.\n')
   
  # Credit: http://stackoverflow.com/a/27663954/583834
  print('{:<25}{}'.format('Numerical Gradient', 'Analytical Gradient'))
  for numerical, analytical in zip(numgrad, grad):
    print('{:<25}{}'.format(numerical, analytical))


  # If you have a correct implementation, and assuming you used EPSILON = 0.0001 
  # in computeNumericalGradient.m, then diff below should be less than 1e-9
  diff = np.linalg.norm(numgrad-grad)/np.linalg.norm(numgrad+grad)
  print(diff)
  print("\n")
  print('If your backpropagation implementation is correct, then \n' \
          'the relative difference will be small (less than 1e-9). \n' \
          '\nRelative Difference: {:.10f}'.format(diff))

检查函数使用 debugInitializeWeights 函数生成自己的数据(因此有一个可重现的示例；只需运行该函数，它将调用其他函数)，然后调用使用以下命令计算梯度的函数有限差异。两者都在下面。

def debugInitializeWeights(fan_out, fan_in):
  """
  Initializes the weights of a layer with fan_in
  incoming connections and fan_out outgoing connections using a fixed
  strategy.

  Input: fan_out, number of outgoing connections for a layer as int; fan_in, number
  of incoming connections for the same layer as int. 
  
  Output: Weight matrix, W, of size(1 + fan_in, fan_out), as the first row of W handles the "bias" terms
  """
  W = np.zeros((fan_out, 1 + fan_in))
  # Initialize W using "sin", this ensures that the values in W are of similar scale;
  # this will be useful for debugging
  W = np.sin(range(1, np.size(W)+1)) / 10 
  return W.reshape(fan_out, fan_in+1)

def computeNumericalGradient(J, nn_params):
  """
  Computes the gradient using "finite differences"
  and provides a numerical estimate of the gradient (i.e.,
  gradient of the function J around theta).
  Credit: Based on the MATLAB code provided by Dr. Andrew Ng, Stanford Univ. 

  Inputs: Cost, J, as computed by nnCost function; Parameter vector, theta.

  Output: Gradient vector using finite differences. Per Dr. Ng, 
  'Sets numgrad(i) to (a numerical approximation of) the partial derivative of 
  J with respect to the i-th input argument, evaluated at theta. (i.e., numgrad(i) should 
  be the (approximately) the partial derivative of J with respect
  to theta(i).)'          
  """
  numgrad = np.zeros(nn_params.shape)
  perturb = np.zeros(nn_params.shape)
  e = .0001
  for i in range(np.size(nn_params)):
      # Set perturbation (i.e., noise) vector
      perturb[i] = e
      # run cost fxn w/ noise added to and subtracted from parameters theta in nn_params
      cost1, grad1 = J((nn_params - perturb))
      cost2, grad2 = J((nn_params + perturb))
      # record the difference in cost function ouputs; this is the numerical gradient
      numgrad[i] = (cost2 - cost1) / (2*e)
      perturb[i] = 0

  return numgrad

该代码不适用于类。 MOOC 是在 MATLAB 中进行的，现在就结束了。这是给我的。网上还有其他解决方案；事实证明，看着他们毫无结果。每个人都有不同的(难以理解的)方法。所以，我非常需要帮助或奇迹。

编辑/更新:分解向量时的 Fortran 排序会影响结果，但我无法让渐变一起移动来更改该选项。

最佳答案

一个想法:我认为你的扰动有点大，是1e-4。对于 double float ，它应该更像 1e-8，即机器精度的根(或者您正在使用单精度？!)。

话虽这么说，有限差分可能是对真实导数的非常近似。具体来说，正如您似乎已经发现的那样，numpy 中的浮点计算不是确定性的。在某些情况下，评估中的噪声可能会抵消许多有效数字。您看到了什么值(value)观以及您期望什么？

关于python - 使用有限差分方法检查神经网络梯度不起作用，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66320255/

文章推荐： sql - 展平表中可能为 NULL 或 presto 为空的数组

Tensorflow 梯度
我正在尝试调整 tf DeepDream 教程代码以使用另一个模型。现在当我调用 tf.gradients() 时: t_grad = tf.gradients(t_score, t_input)[0
tensorflow - tensorflow 中每个示例的未聚合梯度/梯度
考虑到 tensorflow 中 mnist 上的一个简单的小批量梯度下降问题(就像在这个 tutorial 中)，我如何单独检索批次中每个示例的梯度。 tf.gradients()似乎返回批次中所有
python - 掩码数组中的 Numpy 梯度
当我在 numpy 中计算屏蔽数组的梯度时 import numpy as np import numpy.ma as ma x = np.array([100, 2, 3, 5, 5, 5, 10,
machine-learning - 如何计算协方差的导数/梯度？
除了数值计算之外，是否有一种快速方法来获取协方差矩阵(我的网络激活)的导数？我试图将其用作深度神经网络中成本函数中的惩罚项，但为了通过我的层反向传播误差，我需要获得导数。在Matlab中，如果“a
python - 具有张量函数的 Theano 梯度
我有一个计算 3D 空间标量场值的函数，所以我为它提供 x、y 和 z 坐标(由 numpy.meshgrid 获得)的 3D 张量，并在各处使用元素运算。这按预期工作。现在我需要计算标量场的梯度。
python - SciPy KDE 梯度
我正在使用内核密度估计 (KDE) ( http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.htm
python - 计算 tensorflow 梯度
我对 tensorflow gradient documentation 中的示例感到困惑用于计算梯度。 a = tf.constant(0.) b = 2 * a g = tf.gradients(
python - 矢量化 softmax 梯度
我有一个 softmax 层(只有激活本身，没有将输入乘以权重的线性部分)，我想对其进行向后传递。我找到了很多关于 SO 的教程/答案来处理它，但它们似乎都使用 X 作为 (1, n_inputs)
python - 关于矩阵的 tensorflow 梯度
仅供引用，我正在尝试使用 Tensorflow 实现梯度下降算法。我有一个矩阵X [ x1 x2 x3 x4 ] [ x5 x6 x7 x8 ] 我乘以一些特征向量 Y 得到 Z [ y
python - 计算点间距不均匀的 3D 梯度
我目前有一个由几百万个不均匀分布的粒子组成的体积，每个粒子都有一个属性(对于那些好奇的人来说是潜在的)，我想为其计算局部力(加速度)。 np.gradient 仅适用于均匀间隔的数据，我在这里查看:S
c - 梯度(最速)下降的实现
我正在寻找有关如何实现 Gradient (steepest) Descent 的建议在 C 中。我正在寻找 f(x)=||Ax-y||^2 的最小值，其中给出了 A(n,n) 和 y(n)。这在
python - 矢量化 SVM 梯度
我正在查看 SVM 损失和导数的代码，我确实理解了损失，但我无法理解如何以矢量化方式计算梯度 def svm_loss_vectorized(W, X, y, reg): loss = 0.0 dW
multidimensional-array - Julia 中的多维差异/梯度
我正在寻找一种有效的方法来计算 Julia 中多维数组的导数。准确地说，我想要一个等效的 numpy.gradient在 Julia 。但是，Julia 函数 diff : 仅适用于二维数组沿微分维
math - 从带有 x 轴的向量计算角度(梯度)
我在cathesian 2D 系统中有两个点，它们都给了我向量的起点和终点。现在我需要新向量和 x 轴之间的角度。我知道梯度 = (y2-y1)/(x2-x1) 并且我知道角度 = arctan(g
python - 2D 数组的 Numpy 梯度
我有一个 2D 数组正弦模式，想要绘制该函数的 x 和 y 梯度。我有一个二维数组 image_data : def get_image(params): # do some maths on
python - 如何防止 TensorFlow eval 梯度
假设我有一个针对 MNIST 数据的简单 TensorFlow 模型，如下所示 import tensorflow as tf from tensorflow.examples.tutorials.m
tensorflow - 如何在 TensorFlow 中可视化 BPTT 梯度
我想查看我的 Tensorflow LSTM 随时间变化的梯度，例如，绘制从 t=N 到 t=0 的梯度范数。问题是，如何从 Tensorflow 中获取每个时间步长的梯度？最佳答案在图中定义:
tensorflow2.0 - 张量板直方图上未显示 tensorflow v2 梯度
我有一个简单的神经网络，我试图通过使用如下回调使用张量板绘制梯度: class GradientCallback(tf.keras.callbacks.Callback): console =
tensorflow - CPU 上的变量，GPU 上的训练/梯度
在CIFAR-10教程中，我注意到变量被放置在CPU内存中，但它在cifar10-train.py中有说明。它是使用单个 GPU 进行训练的。我很困惑..图层/激活是否存储在 GPU 中？或者，梯度
python - 通过 while_loop 进行 Tensorflow 梯度
我有一个 tensorflow 模型，其中层的输出是二维张量，例如 t = [[1,2], [3,4]] . 下一层需要一个由该张量的每一行组合组成的输入。也就是说，我需要把它变成t_new = [[

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 使用有限差分方法检查神经网络梯度不起作用