python - 如何实现基于动量的随机梯度下降 (SGD)-6ren

python - 如何实现基于动量的随机梯度下降 (SGD)

转载作者：太空宇宙更新时间：2023-11-03 13:35:50

27

4

我正在使用 python 代码 network3.py ( http://neuralnetworksanddeeplearning.com/chap6.html ) 来开发卷积神经网络。现在我想通过添加如下动量学习规则来稍微修改代码:

velocity = momentum_constant * velocity - learning_rate * gradient
params = params + velocity

有人知道怎么做吗？特别是，如何设置或初始化速度？我在下面发布 SGD 的代码:

def __init__(self, layers, mini_batch_size):
    """Takes a list of `layers`, describing the network architecture, and
    a value for the `mini_batch_size` to be used during training
    by stochastic gradient descent.

    """
    self.layers = layers
    self.mini_batch_size = mini_batch_size
    self.params = [param for layer in self.layers for param in layer.params]
    self.x = T.matrix("x")
    self.y = T.ivector("y")
    init_layer = self.layers[0]
    init_layer.set_inpt(self.x, self.x, self.mini_batch_size)
    for j in xrange(1, len(self.layers)):
        prev_layer, layer  = self.layers[j-1], self.layers[j]
        layer.set_inpt(
            prev_layer.output, prev_layer.output_dropout, self.mini_batch_size)
    self.output = self.layers[-1].output
    self.output_dropout = self.layers[-1].output_dropout


def SGD(self, training_data, epochs, mini_batch_size, eta,
        validation_data, test_data, lmbda=0.0):
    """Train the network using mini-batch stochastic gradient descent."""
    training_x, training_y = training_data
    validation_x, validation_y = validation_data
    test_x, test_y = test_data

    # compute number of minibatches for training, validation and testing
    num_training_batches = size(training_data)/mini_batch_size
    num_validation_batches = size(validation_data)/mini_batch_size
    num_test_batches = size(test_data)/mini_batch_size

    # define the (regularized) cost function, symbolic gradients, and updates
    l2_norm_squared = sum([(layer.w**2).sum() for layer in self.layers])
    cost = self.layers[-1].cost(self)+\
           0.5*lmbda*l2_norm_squared/num_training_batches
    grads = T.grad(cost, self.params)
    updates = [(param, param-eta*grad)
               for param, grad in zip(self.params, grads)]

    # define functions to train a mini-batch, and to compute the
    # accuracy in validation and test mini-batches.
    i = T.lscalar() # mini-batch index
    train_mb = theano.function(
        [i], cost, updates=updates,
        givens={
            self.x:
            training_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size],
            self.y:
            training_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
        })
    validate_mb_accuracy = theano.function(
        [i], self.layers[-1].accuracy(self.y),
        givens={
            self.x:
            validation_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size],
            self.y:
            validation_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
        })
    test_mb_accuracy = theano.function(
        [i], self.layers[-1].accuracy(self.y),
        givens={
            self.x:
            test_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size],
            self.y:
            test_y[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
        })
    self.test_mb_predictions = theano.function(
        [i], self.layers[-1].y_out,
        givens={
            self.x:
            test_x[i*self.mini_batch_size: (i+1)*self.mini_batch_size]
        })
    # Do the actual training
    best_validation_accuracy = 0.0
    for epoch in xrange(epochs):
        for minibatch_index in xrange(num_training_batches):
            iteration = num_training_batches*epoch+minibatch_index
            if iteration % 1000 == 0:
                print("Training mini-batch number {0}".format(iteration))
            cost_ij = train_mb(minibatch_index)
            if (iteration+1) % num_training_batches == 0:
                validation_accuracy = np.mean(
                    [validate_mb_accuracy(j) for j in xrange(num_validation_batches)])
                print("Epoch {0}: validation accuracy {1:.2%}".format(
                    epoch, validation_accuracy))
                if validation_accuracy >= best_validation_accuracy:
                    print("This is the best validation accuracy to date.")
                    best_validation_accuracy = validation_accuracy
                    best_iteration = iteration
                    if test_data:
                        test_accuracy = np.mean(
                            [test_mb_accuracy(j) for j in xrange(num_test_batches)])
                        print('The corresponding test accuracy is {0:.2%}'.format(
                            test_accuracy))

最佳答案

我只从头开始编写 SDG(不使用 theano)，但从你的代码来看你需要

1) 用一堆零(每个梯度一个)启动速度，

2) 在更新中包含速度；类似

updates = [(param, param-eta*grad +momentum_constant*vel)
           for param, grad, vel in zip(self.params, grads, velocities)]

3) 修改您的训练函数以在每次迭代时返回梯度，以便您可以更新速度。

关于python - 如何实现基于动量的随机梯度下降 (SGD)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39855184/

27

4

0

文章推荐： java - Javaee 中的 SSL/TLS 协议(protocol)版本

文章推荐： C# FTP Response.LastModified 总是返回 1/1/0001 12 :00:00 AM

文章推荐： ubuntu - Ubuntu 上的 Letsencrypt、nginx 和虚拟服务器

文章推荐： python - 将列重新格式化为仅前 5 个字符

python-3.x - python 中的 SGD 分类器和 SGD 回归器有什么区别？
Python sklearn 中的 SGD 分类器和 SGD 回归器有什么区别？我们还可以设置批量大小以获得更快的性能吗？最佳答案嗯，它就在名字里。 SGD分类器是在分类问题中使用SGD(一次取每
1bit SGD 与普通 SGD 在 4 个 GPU 中的 Python CNTK 速度比较
我安装的版本2.0.beta7来自带有 Ubuntu (python 3.4) 的 Azure NC24 GPU VM 中的 CNTK。该机器有 4 个 NVIDIA K80 GPU。构建信息:
python - 如何使用小批量代替 SGD
这是一个用 python 快速实现单层神经网络的方法: import numpy as np # simulate data np.random.seed(94106) X = np.random.r
python - SGD - 损失在一些迭代后开始增加
我正在尝试实现具有两个约束的随机梯度下降，因此不能使用 scikit-learn。不幸的是，我已经在没有这两个限制的情况下与常规 SGD 作斗争。训练集上的损失(平方损失)在一些迭代中下降，但在一段时
python - SGD 优化器图
我只是想问一个简单的问题。我知道 val_loss 和 train_loss 不足以判断模型是否过度拟合。但是，我希望通过监视 val_loss 是否增加来将其用作粗略的衡量标准。当我使用 SGD 优
python - 如何实现基于动量的随机梯度下降 (SGD)
我正在使用 python 代码 network3.py ( http://neuralnetworksanddeeplearning.com/chap6.html ) 来开发卷积神经网络。现在我想通过
Keras SGD 随机梯度下降优化器参数设置方式
SGD 随机梯度下降 Keras 中包含了各式优化器供我们使用，但通常我会倾向于使用 SGD 验证模型能否快速收敛，然后调整不同的学习速率看看模型最后的性能，然后再尝试使用其他优化器。 Kera
python - SKlearn SGD 部分拟合
我在这里做错了什么？我有一个大数据集，我想使用 Scikit-learn 的 SGDClassifier 执行部分拟合我做以下 from sklearn.linear_model import SG
tensorflow - TensorFlow 中具有动量的 SGD
在 Caffe 中，SGD 求解器有一个动量参数 (link)。在 TensorFlow 中，我看到 tf.train.GradientDescentOptimizer没有明确的动量参数。但是，我可以
python - tensorflow 中具有权重衰减参数的 SGD
在Keras和Pytorch中，SGD优化器有权重衰减参数。我发现tf.train.GradientDescentOptimizer没有权重衰减参数。具有权重衰减的 SGD 的 tensorflow
machine-learning - SGD 的训练准确率
如何计算 SGD 的训练精度？您是否使用训练网络的批量数据来计算它？或者使用整个数据集？ (对于每个批处理优化迭代) 我尝试使用训练网络的批量数据来计算每次迭代的训练准确性。它几乎总是给我 100%
machine-learning - SGD 和反向传播有什么区别？
您能告诉我随机梯度下降(SGD)和反向传播之间的区别吗？最佳答案反向传播是一种在有向计算图中(例如神经网络)计算梯度的有效方法。这不是一种学习方法，而是一种经常用于学习方法的很好的计算技巧。这实际
python - 为什么我的 SGD 与我的线性回归模型相去甚远？
我正在尝试将线性回归(正规方程)与 SGD 进行比较，但看起来 SGD 相去甚远。我做错了什么吗？这是我的代码 x = np.random.randint(100, size=1000) y = x
python - PyTorch 中的 SGD 优化器实际上是在执行梯度下降算法吗？
我正在尝试比较神经网络的 SGD 和 GD 算法的收敛速度。在 PyTorch 中，我们经常使用 SGD 优化器，如下所示。 train_dataloader = torch.utils.data.D
machine-learning - 使用学习率进行 SGD 收敛测试
任何人都可以解释一下此 lecture 第8分钟提出的收敛测试吗？作者:雨果·拉罗谢尔？最佳答案这些条件确保渐近收敛。在这种情况下，我们应该能够无限次更新近似解。直观上，要实现这一点，学习率应始终
machine-learning - 通过异步训练更新 SGD 权重空间
我正在寻找创造性的方法来加快我的神经网络的训练时间，也可能减少梯度消失。我正在考虑将网络分解到不同的节点上，在每个节点上使用分类器作为反向传播“助推器”，然后将节点堆叠在一起，每个节点之间的连接稀疏(
machine-learning - pytorch SGD 的默认批量大小是多少？
如果我提供整个数据并且不指定批量大小，pytorch SGD 会做什么？我在这种情况下没有看到任何“随机”或“随机性”。例如，在下面的简单代码中，我将整个数据 (x,y) 输入到模型中。 optimi
machine-learning - SGD 小批量 - 大小相同？
具有小批量的随机梯度下降算法通常使用小批量的大小或计数作为参数。 Now what I'm wondering, do all of the mini-batches need to be of ex
machine-learning - SGD 型号 "overconfidence"
我正在使用 Apache Mahout 解决二进制分类问题。我使用的算法是 OnlineLogisticRegression，我目前拥有的模型强烈倾向于产生 1 或 0 的预测，没有任何中间值。请提
python - caffe python 手册 sgd
我正在尝试实现 SGD 功能以在 caffe python 中手动更新 python 中的权重，而不是使用 solver.step() 函数。目标是在执行 solver.step() 后通过手动更新权

首页

博学

6Ren·AI

商城

python - 如何实现基于动量的随机梯度下降 (SGD)