- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
经过整整一周的打印语句、维度分析、重构和大声朗读代码后,我可以说我完全陷入困境.
我的成本函数产生的梯度与有限差分产生的梯度相差太远。
我已经确认我的成本函数会为正则化输入产生正确的成本,但不会。这是成本函数:
def nnCost(nn_params, X, y, lambda_, input_layer_size, hidden_layer_size, num_labels):
# reshape parameter/weight vectors to suit network size
Theta1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)], (hidden_layer_size, (input_layer_size + 1)))
Theta2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size+1)):], (num_labels, (hidden_layer_size + 1)))
if lambda_ is None:
lambda_ = 0
# grab number of observations
m = X.shape[0]
# init variables we must return
cost = 0
Theta1_grad = np.zeros(Theta1.shape)
Theta2_grad = np.zeros(Theta2.shape)
# one-hot encode the vector y
y_mtx = pd.get_dummies(y.ravel()).to_numpy()
ones = np.ones((m, 1))
X = np.hstack((ones, X))
# layer 1
a1 = X
z2 = <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="adf9c5c8d9cc9cedcc9c83f9" rel="noreferrer noopener nofollow">[email protected]</a>
# layer 2
ones_l2 = np.ones((y.shape[0], 1))
a2 = np.hstack((ones_l2, sigmoid(z2.T)))
z3 = <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d682beb3a2b7e496b7e4f882" rel="noreferrer noopener nofollow">[email protected]</a>
# layer 3
a3 = sigmoid(z3)
reg_term = (lambda_/(2*m)) * (np.sum(np.sum(np.multiply(Theta1, Theta1))) + np.sum(np.sum(np.multiply(Theta2,Theta2))) - np.subtract((Theta1[:,0].T@Theta1[:,0]),(Theta2[:,0].T@Theta2[:,0])))
cost = (1/m) * np.sum((-np.log(a3).T * (y_mtx) - np.log(1-a3).T * (1-y_mtx))) + reg_term
# BACKPROPAGATION
# δ3 equals the difference between a3 and the y_matrix
d3 = a3 - y_mtx.T
# δ2 equals the product of δ3 and Θ2 (ignoring the Θ2 bias units) multiplied element-wise by the g′() of z2 (computed back in Step 2).
d2 = Theta2[:,1:].T@d3 * sigmoidGradient(z2)
# Δ1 equals the product of δ2 and a1.
Delta1 = d2@a1
Delta1 /= m
# Δ2 equals the product of δ3 and a2.
Delta2 = d3@a2
Delta2 /= m
reg_term1 = (lambda_/m) * np.append(np.zeros((Theta1.shape[0],1)), Theta1[:,1:], axis=1)
reg_term2 = (lambda_/m) * np.append(np.zeros((Theta2.shape[0],1)), Theta2[:,1:], axis=1)
Theta1_grad = Delta1 + reg_term1
Theta2_grad = Delta2 + reg_term2
grad = np.append(Theta1_grad.ravel(), Theta2_grad.ravel())
return cost, grad
这是检查渐变的代码。我已经检查过每一行,但我想不出任何可以改变的地方。看起来一切正常。
def checkNNGradients(lambda_):
"""
Creates a small neural network to check the backpropagation gradients.
Credit: Based on the MATLAB code provided by Dr. Andrew Ng, Stanford Univ.
Input: Regularization parameter, lambda, as int or float.
Output: Analytical gradients produced by backprop code and the numerical gradients (computed
using computeNumericalGradient). These two gradient computations should result in
very similar values.
"""
input_layer_size = 3
hidden_layer_size = 5
num_labels = 3
m = 5
# generate 'random' test data
Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size)
Theta2 = debugInitializeWeights(num_labels, hidden_layer_size)
# reusing debugInitializeWeights to generate X
X = debugInitializeWeights(m, input_layer_size - 1)
y = np.ones(m) + np.remainder(np.range(m), num_labels)
# unroll parameters
nn_params = np.append(Theta1.ravel(), Theta2.ravel())
costFunc = lambda p: nnCost(p, X, y, lambda_, input_layer_size, hidden_layer_size, num_labels)
cost, grad = costFunc(nn_params)
numgrad = computeNumericalGradient(costFunc, nn_params)
# examine the two gradient computations; two columns should be very similar.
print('The columns below should be very similar.\n')
# Credit: http://stackoverflow.com/a/27663954/583834
print('{:<25}{}'.format('Numerical Gradient', 'Analytical Gradient'))
for numerical, analytical in zip(numgrad, grad):
print('{:<25}{}'.format(numerical, analytical))
# If you have a correct implementation, and assuming you used EPSILON = 0.0001
# in computeNumericalGradient.m, then diff below should be less than 1e-9
diff = np.linalg.norm(numgrad-grad)/np.linalg.norm(numgrad+grad)
print(diff)
print("\n")
print('If your backpropagation implementation is correct, then \n' \
'the relative difference will be small (less than 1e-9). \n' \
'\nRelative Difference: {:.10f}'.format(diff))
检查函数使用 debugInitializeWeights
函数生成自己的数据(因此有一个可重现的示例;只需运行该函数,它将调用其他函数),然后调用使用以下命令计算梯度的函数有限差异。两者都在下面。
def debugInitializeWeights(fan_out, fan_in):
"""
Initializes the weights of a layer with fan_in
incoming connections and fan_out outgoing connections using a fixed
strategy.
Input: fan_out, number of outgoing connections for a layer as int; fan_in, number
of incoming connections for the same layer as int.
Output: Weight matrix, W, of size(1 + fan_in, fan_out), as the first row of W handles the "bias" terms
"""
W = np.zeros((fan_out, 1 + fan_in))
# Initialize W using "sin", this ensures that the values in W are of similar scale;
# this will be useful for debugging
W = np.sin(range(1, np.size(W)+1)) / 10
return W.reshape(fan_out, fan_in+1)
def computeNumericalGradient(J, nn_params):
"""
Computes the gradient using "finite differences"
and provides a numerical estimate of the gradient (i.e.,
gradient of the function J around theta).
Credit: Based on the MATLAB code provided by Dr. Andrew Ng, Stanford Univ.
Inputs: Cost, J, as computed by nnCost function; Parameter vector, theta.
Output: Gradient vector using finite differences. Per Dr. Ng,
'Sets numgrad(i) to (a numerical approximation of) the partial derivative of
J with respect to the i-th input argument, evaluated at theta. (i.e., numgrad(i) should
be the (approximately) the partial derivative of J with respect
to theta(i).)'
"""
numgrad = np.zeros(nn_params.shape)
perturb = np.zeros(nn_params.shape)
e = .0001
for i in range(np.size(nn_params)):
# Set perturbation (i.e., noise) vector
perturb[i] = e
# run cost fxn w/ noise added to and subtracted from parameters theta in nn_params
cost1, grad1 = J((nn_params - perturb))
cost2, grad2 = J((nn_params + perturb))
# record the difference in cost function ouputs; this is the numerical gradient
numgrad[i] = (cost2 - cost1) / (2*e)
perturb[i] = 0
return numgrad
该代码不适用于类。 MOOC 是在 MATLAB 中进行的,现在就结束了。这是给我的。网上还有其他解决方案;事实证明,看着他们毫无结果。每个人都有不同的(难以理解的)方法。所以,我非常需要帮助或奇迹。
编辑/更新:分解向量时的 Fortran 排序会影响结果,但我无法让渐变一起移动来更改该选项。
最佳答案
一个想法:我认为你的扰动有点大,是1e-4
。对于 double float ,它应该更像 1e-8
,即机器精度的根(或者您正在使用单精度?!)。
话虽这么说,有限差分可能是对真实导数的非常近似。具体来说,正如您似乎已经发现的那样,numpy 中的浮点计算不是确定性的。在某些情况下,评估中的噪声可能会抵消许多有效数字。您看到了什么值(value)观以及您期望什么?
关于python - 使用有限差分方法检查神经网络梯度不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66320255/
我有一个关于复杂性的简单问题。我在 Java 中有这段代码: pairs是 HashMap包含 Integer作为键,它的频率为 Collection作为一个值。所以: pairs = new Has
对于我的应用程序,我需要在 Coq 中使用和推理有限映射。谷歌搜索我发现 FMapAVL 似乎非常适合我的需求。问题是文档很少,我还没有弄清楚我应该如何使用它。 作为一个简单的例子,考虑以下使用对列表
我有一个主表tblAssetMaster A和一个移动表tblMovement M。 我想提取所有 Assets 及其当前位置,因此需要获取每个 Assets 的最新移动条目。 字段 A: Asset
我想让我的网站内容居中,但仅限于网页的特定宽度。所以当它超过 500px 时,我希望内容被修复,无法进一步拉伸(stretch)。无论如何都要这样做,还是我最好把所有东西都修好?希望有意义的是添加一些
我正在尝试批量删除 Backbone 模型的集合,如下所示...... collection.each(function(model, i){ model.destroy(); }); 我发现当每
我想要一个软件环境,在其中我可以在具有特定资源的硬件上测试我的软件的速度。例如,当我的主机硬件是具有 12GB RAM 的 3GHz 四核 amd64 时,该程序在具有 24 Mb RAM 的 800
在 Eclipse 中,我得到了 BigInteger.valueOf(2).pow(31093) 的值,但没有得到 BigInteger.valueOf(2).pow(31094) 的值(它是空的)
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。 要求提供代码的问题必须表现出对所解决问题的最低限度理解。包括尝试过的解决方案、为什么它们不起作用,以及预
我想将 2 个表从本地 sql server 2000 上传到托管的 mysql。第一个表有 17 列和 680 行,其他 10 列和 8071 行。 我首先使用 xampp mysql 尝试离线,它
我在 S3 中自动生成并保存了静态 html 文件。有时文件大小达到 2mb。是否可以使用javascript来获取html文件的一部分,显示它,当用户到达页面底部时,获取下一部分等等? 最佳答案 X
我是一名优秀的程序员,十分优秀!