gpt4 book ai didi

python - caffe python 手册 sgd

转载 作者:行者123 更新时间:2023-11-28 22:37:32 25 4
gpt4 key购买 nike

我正在尝试实现 SGD 功能以在 caffe python 中手动更新 python 中的权重,而不是使用 solver.step() 函数。目标是在执行 solver.step() 后通过手动更新权重来匹配权重更新。

设置如下:使用 MNIST 数据。将 solver.prototxt 中的随机种子设置为:random_seed: 52。确保 momentum: 0.0base_lr: 0.01, lr_policy: "fixed"。上面这样做是为了让我可以简单地实现 SGD 更新方程(没有动量、正则化等)。等式很简单:W_t+1 = W_t - mu * W_t_diff

以下是两个测试:

测试 1:使用caffe的forward()和backward()计算前向传播和反向传播。对于包含权重的每一层,我都这样做:

    for k in weight_layer_idx:
solver.net.layers[k].blobs[0].diff[...] *= lr # weights
solver.net.layers[k].blobs[1].diff[...] *= lr # biases

接下来,将权重/偏差更新为:

        solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff

我运行了 5 次迭代。

测试 2:运行 caffe 的 solver.step(5)

现在,我期望这两个测试在两次迭代后应该产生完全相同的权重。

我在上述每个测试后保存了权重值,并计算了两个测试的权重向量之间的范数差,我发现它们不是位精确的。有人能发现我可能遗漏的东西吗?

完整代码如下,供引用:

import caffe
caffe.set_device(0)
caffe.set_mode_gpu()
import numpy as np

niter = 5
solver = None
solver = caffe.SGDSolver('solver.prototxt')

# Automatic SGD: TEST2
solver.step(niter)
# save the weights to compare later
w_solver_step = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_solver_step = copy(solver.net.layers[1].blobs[1].data.astype('float64'))

# Manual SGD: TEST1
solver = None
solver = caffe.SGDSolver('solver.prototxt')
lr = 0.01
momentum = 0.

# Get layer types
layer_types = []
for ll in solver.net.layers:
layer_types.append(ll.type)

# Get the indices of layers that have weights in them
weight_layer_idx = [idx for idx,l in enumerate(layer_types) if 'Convolution' in l or 'InnerProduct' in l]

for it in range(1, niter+1):
solver.net.forward() # fprop
solver.net.backward() # bprop
for k in weight_layer_idx:
solver.net.layers[k].blobs[0].diff[...] *= lr
solver.net.layers[k].blobs[1].diff[...] *= lr
solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff

# save the weights to compare later
w_fwdbwd_update = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_fwdbwd_update = copy(solver.net.layers[1].blobs[1].data.astype('float64'))

# Compare
print "after iter", niter, ": weight diff: ", np.linalg.norm(w_solver_step - w_fwdbwd_update), "and bias diff:", np.linalg.norm(b_solver_step - b_fwdbwd_update)

将权重与两个测试进行比较的最后一行产生:

在 iter 5 之后:权重差异:0.000203027766144 和偏差差异:1.78390789051e-05我希望这个差异为 0.0

有什么想法吗?

最佳答案

你几乎是对的,你只需要在每次更新后将差异设置为零。Caffe 不会自动执行此操作以让您有机会实现批处理累积(为一次权重更新累积多个批处理的梯度,如果您的内存不足以满足您所需的批处理大小,这会很有帮助)。

另一个可能的问题可能是 cudnn 的使用,它的卷积实现是不确定的(或者准确地说它是如何设置为在 caffe 中使用的)。通常,这应该没有问题,但在您的情况下,每次都会导致略有不同的结果,因此权重也不同。如果你用 cudnn 编译了 caffe,你可以简单地将模式设置为 cpu 以防止在测试时发生这种情况。

此外,求解器参数对权重更新有影响。如您所述,您应该注意:

  • lr_policy:“固定”
  • 势头:0
  • weight_decay:0
  • random_seed: 52 # 或任何其他常量

在网络中,一定不要使用学习率乘数,通常学习偏差的速度是权重的两倍,但这不是您实现的行为。所以你需要确保在层定义中将它们设置为一个:

param {
lr_mult: 1 # weight lr multiplier
}
param {
lr_mult: 1 # bias lr multiplier
}

最后但同样重要的是,这里有一个例子,你的代码在使用动量、权重衰减和 lr_mult 时会是什么样子。在 CPU 模式下,这会产生预期的输出(无差异):

import caffe
caffe.set_device(0)
caffe.set_mode_cpu()
import numpy as np

niter = 5
solver = None
solver = caffe.SGDSolver('solver.prototxt')

# Automatic SGD: TEST2
solver.step(niter)
# save the weights to compare later
w_solver_step = solver.net.layers[1].blobs[0].data.copy()
b_solver_step = solver.net.layers[1].blobs[1].data.copy()

# Manual SGD: TEST1
solver = None
solver = caffe.SGDSolver('solver.prototxt')
base_lr = 0.01
momentum = 0.9
weight_decay = 0.0005
lr_w_mult = 1
lr_b_mult = 2

momentum_hist = {}
for layer in solver.net.params:
m_w = np.zeros_like(solver.net.params[layer][0].data)
m_b = np.zeros_like(solver.net.params[layer][1].data)
momentum_hist[layer] = [m_w, m_b]

for i in range(niter):
solver.net.forward()
solver.net.backward()
for layer in solver.net.params:
momentum_hist[layer][0] = momentum_hist[layer][0] * momentum + (solver.net.params[layer][0].diff + weight_decay *
solver.net.params[layer][0].data) * base_lr * lr_w_mult
momentum_hist[layer][1] = momentum_hist[layer][1] * momentum + (solver.net.params[layer][1].diff + weight_decay *
solver.net.params[layer][1].data) * base_lr * lr_b_mult
solver.net.params[layer][0].data[...] -= momentum_hist[layer][0]
solver.net.params[layer][1].data[...] -= momentum_hist[layer][1]
solver.net.params[layer][0].diff[...] *= 0
solver.net.params[layer][1].diff[...] *= 0

# save the weights to compare later
w_fwdbwd_update = solver.net.layers[1].blobs[0].data.copy()
b_fwdbwd_update = solver.net.layers[1].blobs[1].data.copy()

# Compare
print "after iter", niter, ": weight diff: ", np.linalg.norm(w_solver_step - w_fwdbwd_update), "and bias diff:", np.linalg.norm(b_solver_step - b_fwdbwd_update)

关于python - caffe python 手册 sgd,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36459266/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com