- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
编辑:请参阅此问题的末尾以获取解决方案
TL;DR:我需要找到一种方法来计算每批处理的标签分布,并更新学习率。有没有办法访问当前模型的优化器来更新每批处理的 learning_rate?
下面是如何计算标签分布。它可以在损失函数中完成,因为默认情况下损失是按批处理计算的。在哪里可以执行此代码,它也可以访问模型的优化器?
def loss(y_true, y_pred):
y = math_ops.argmax(y_true, axis=1)
freqs = tf.gather(lf, y) # equal to lf[y] if `lf` and `y` were numpy array's
inv_freqs = math_ops.pow(freqs, -1)
E = 1 / math_ops.reduce_sum(inv_freqs) # value to use when updating learning rate
为了实现学习率计划,如this paper所述,我相信我需要一种方法来更新训练期间的学习率,每批处理,通过根据真实标签的标签分布计算的值批处理(y_true
作为它通常在 keras/tensorflow 中表示)
where ...
x the output from model
y the corresponding ground truth labels
Β the minibatch of m samples (e.g. 64)
ny the entire training sample size for ground truth label y
ny-1 the inverse label frequency
我关注的公式部分是 α 和 Δθ
我可以在自定义损失函数中轻松实现这一点,但我不知道如何从损失函数中更新学习率——如果你能的话。
def loss(y_true, y_pred):
y = math_ops.argmax(y_true, axis=1)
freqs = tf.gather(lf, y) # equal to lf[y] if `lf` and `y` were numpy array's
inv_freqs = math_ops.pow(freqs, -1)
E = 1 / math_ops.reduce_sum(inv_freqs) # value to use when updating learning rate
where ...
lf the sample frequencies for each class. e.g. 2 classes, c0 = 10 examples, c1 = 100 -->
lf == [10, 100]
有没有什么奇特的方法可以更新优化器的学习率,比如可以从回调中完成什么?
def on_batch_begin(self, batch, log):
# note: batch is just an incremented value to indicate batch index
self.model.optimizer.lr # learning rate, can be modified from callback
在此先感谢您的帮助!
非常感谢@mrk 插入我朝着正确的方向解决这个问题!
为了计算每批处理的标签分布,然后使用该值来更新优化器的学习率,必须...
keras.callbacks.History
类来创建一个典型的学习率调度器on_batch_end
函数,logs
字典将包含批处理的所有计算指标,包括我们的自定义标签分布指标! class LabelDistribution(tf.keras.metrics.Metric):
"""
Computes the per-batch label distribution (y_true) and stores the array as
a metric which can be accessed via keras CallBack's
:param n_class: int - number of distinct output class(es)
"""
def __init__(self, n_class, name='batch_label_distribution', **kwargs):
super(LabelDistribution, self).__init__(name=name, **kwargs)
self.n_class = n_class
self.label_distribution = self.add_weight(name='ld', initializer='zeros',
aggregation=VariableAggregation.NONE,
shape=(self.n_class, ))
def update_state(self, y_true, y_pred, sample_weight=None):
y_true = mo.cast(y_true, 'int32')
y = mo.argmax(y_true, axis=1)
label_distrib = mo.bincount(mo.cast(y, 'int32'))
self.label_distribution.assign(mo.cast(label_distrib, 'float32'))
def result(self):
return self.label_distribution
def reset_states(self):
self.label_distribution.assign([0]*self.n_class)
class DRWLearningRateSchedule(keras.callbacks.History):
"""
Used to implement the Differed Re-weighting strategy from
[Kaidi Cao, et al. "Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss." (2019)]
(https://arxiv.org/abs/1906.07413)
To be included as a metric to model.compile
`model.compile(..., metrics=[DRWLearningRateSchedule(.01)])`
"""
def __init__(self, base_lr, ld_metric='batch_label_distribution'):
super(DRWLearningRateSchedule, self).__init__()
self.base_lr = base_lr
self.ld_metric = ld_metric # name of the LabelDistribution metric
def on_batch_end(self, batch, logs=None):
ld = logs.get(self.ld_metric) # the per-batch label distribution
current_lr = self.model.optimizer.lr
# example below of updating the optimizers learning rate
K.set_value(self.model.optimizer.lr, current_lr * (1 / math_ops.reduce_sum(ld)))
最佳答案
Keras 基于损失的学习率自适应
经过一些研究,我找到了 this ,而不是触发衰减,您还可以定义另一个函数或学习率的值。
from __future__ import absolute_import
from __future__ import print_function
import keras
from keras import backend as K
import numpy as np
class LossLearningRateScheduler(keras.callbacks.History):
"""
A learning rate scheduler that relies on changes in loss function
value to dictate whether learning rate is decayed or not.
LossLearningRateScheduler has the following properties:
base_lr: the starting learning rate
lookback_epochs: the number of epochs in the past to compare with the loss function at the current epoch to determine if progress is being made.
decay_threshold / decay_multiple: if loss function has not improved by a factor of decay_threshold * lookback_epochs, then decay_multiple will be applied to the learning rate.
spike_epochs: list of the epoch numbers where you want to spike the learning rate.
spike_multiple: the multiple applied to the current learning rate for a spike.
"""
def __init__(self, base_lr, lookback_epochs, spike_epochs = None, spike_multiple = 10, decay_threshold = 0.002, decay_multiple = 0.5, loss_type = 'val_loss'):
super(LossLearningRateScheduler, self).__init__()
self.base_lr = base_lr
self.lookback_epochs = lookback_epochs
self.spike_epochs = spike_epochs
self.spike_multiple = spike_multiple
self.decay_threshold = decay_threshold
self.decay_multiple = decay_multiple
self.loss_type = loss_type
def on_epoch_begin(self, epoch, logs=None):
if len(self.epoch) > self.lookback_epochs:
current_lr = K.get_value(self.model.optimizer.lr)
target_loss = self.history[self.loss_type]
loss_diff = target_loss[-int(self.lookback_epochs)] - target_loss[-1]
if loss_diff <= np.abs(target_loss[-1]) * (self.decay_threshold * self.lookback_epochs):
print(' '.join(('Changing learning rate from', str(current_lr), 'to', str(current_lr * self.decay_multiple))))
K.set_value(self.model.optimizer.lr, current_lr * self.decay_multiple)
current_lr = current_lr * self.decay_multiple
else:
print(' '.join(('Learning rate:', str(current_lr))))
if self.spike_epochs is not None and len(self.epoch) in self.spike_epochs:
print(' '.join(('Spiking learning rate from', str(current_lr), 'to', str(current_lr * self.spike_multiple))))
K.set_value(self.model.optimizer.lr, current_lr * self.spike_multiple)
else:
print(' '.join(('Setting learning rate to', str(self.base_lr))))
K.set_value(self.model.optimizer.lr, self.base_lr)
return K.get_value(self.model.optimizer.lr)
def main():
return
if __name__ == '__main__':
main()
关于python - 是否可以根据批处理标签(y_true)分布更新每批处理的学习率?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61859671/
我是一名优秀的程序员,十分优秀!