gpt4 book ai didi

python - 具有 L-BFGS-B 方法 maxiter 属性的 scipy.optimize.minimize 函数不起作用

转载 作者:太空宇宙 更新时间:2023-11-04 05:05:49 26 4
gpt4 key购买 nike

我有一个简单的成本函数,我想使用 scipy.optimize.minimize 函数对其进行优化。

opt_solution  = scipy.optimize.minimize(costFunction, theta, args = (training_data,), method = 'L-BFGS-B', jac = True, options = {'maxiter': 100)

其中costFunction是要优化的函数,theta是要优化的参数。在 costFunction 中,我打印了成本函数的值。但是无论我将值从 10 增加到 100000,参数 maxiter 似乎都没有影响。所花费的时间是相同的。另外,我期望成本函数的打印值应该等于 maxiter 的值。所以我觉得 maxiter 没有效果。可能是什么问题?代价函数是

def costFunction(self, theta, input):

""" Extract weights and biases from 'theta' input """

W1 = theta[self.limit0 : self.limit1].reshape(self.hidden_size, self.visible_size)
W2 = theta[self.limit1 : self.limit2].reshape(self.visible_size, self.hidden_size)
b1 = theta[self.limit2 : self.limit3].reshape(self.hidden_size, 1)
b2 = theta[self.limit3 : self.limit4].reshape(self.visible_size, 1)

""" Compute output layers by performing a feedforward pass
Computation is done for all the training inputs simultaneously """

hidden_layer = self.sigmoid(numpy.dot(W1, input) + b1)
output_layer = self.sigmoid(numpy.dot(W2, hidden_layer) + b2)

""" Compute intermediate difference values using Backpropagation algorithm """

diff = output_layer - input
sum_of_squares_error = 0.5 * numpy.sum(numpy.multiply(diff, diff)) / input.shape[1]
weight_decay = 0.5 * self.lamda * (numpy.sum(numpy.multiply(W1, W1)) + numpy.sum(numpy.multiply(W2, W2)))
cost = sum_of_squares_error + weight_decay

""" Compute the gradient values by averaging partial derivatives
Partial derivatives are averaged over all training examples """

W1_grad = numpy.dot(del_hid, numpy.transpose(input))
W2_grad = numpy.dot(del_out, numpy.transpose(hidden_layer))
b1_grad = numpy.sum(del_hid, axis = 1)
b2_grad = numpy.sum(del_out, axis = 1)

W1_grad = W1_grad / input.shape[1] + self.lamda * W1
W2_grad = W2_grad / input.shape[1] + self.lamda * W2
b1_grad = b1_grad / input.shape[1]
b2_grad = b2_grad / input.shape[1]

""" Transform numpy matrices into arrays """

W1_grad = numpy.array(W1_grad)
W2_grad = numpy.array(W2_grad)
b1_grad = numpy.array(b1_grad)
b2_grad = numpy.array(b2_grad)

""" Unroll the gradient values and return as 'theta' gradient """

theta_grad = numpy.concatenate((W1_grad.flatten(), W2_grad.flatten(),
b1_grad.flatten(), b2_grad.flatten()))
# Update counter value
self.counter += 1
print "Index ", self.counter, "cost ", cost
return [cost, theta_grad]

最佳答案

maxiter 给出了 scipy 在放弃改进解决方案之前将尝试的最大迭代次数。但它很可能对解决方案感到满意并提前停止。

如果你看the docs for minimize when using the 'l-bfgs-b' method ,请注意,您可以将三个参数作为选项(factrftolgtol)传递,这也会导致迭代停止。

在像您这样的简单情况下,特别是如果您的成本函数还提供梯度(如调用中的 jac=True 所示),收敛通常发生在前几次迭代中,因此远早于已达到 maxiter 限制。

关于python - 具有 L-BFGS-B 方法 maxiter 属性的 scipy.optimize.minimize 函数不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44515528/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com