gpt4 book ai didi

python - 在 TF.Keras 中使用自定义模型进行梯度累积?

转载 作者:行者123 更新时间:2023-12-03 14:37:01 25 4
gpt4 key购买 nike

请对您的想法添加最少的评论,以便我可以改进我的查询。谢谢。 :)

我正在尝试训练 tf.keras 的模型梯度累积 (GA)。但我不想在自定义训练循环( like )中使用它,而是自定义 .fit()方法通过覆盖 train_step 。是否可以?如何做到这一点?原因是如果我们想得到keras的好处内置功能,如 fit , callbacks ,我们不想使用自定义训练循环,但同时如果我们想覆盖 train_step出于某种原因(例如 GA 或其他),我们可以自定义 fit方法,并且仍然可以利用这些内置函数。
而且,我知道使用 的优点GA 但是使用它的主要缺点是什么?为什么它不是默认功能,而是框架的可选功能?

# overriding train step 
# my attempt
# it's not appropriately implemented
# and need to fix
class CustomTrainStep(tf.keras.Model):
def __init__(self, n_gradients, *args, **kwargs):
super().__init__(*args, **kwargs)
self.n_gradients = n_gradients
self.gradient_accumulation = [tf.zeros_like(this_var) for this_var in \
self.trainable_variables]

def train_step(self, data):
x, y = data
batch_size = tf.cast(tf.shape(x)[0], tf.float32)
# Gradient Tape
with tf.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
# Calculate batch gradients
gradients = tape.gradient(loss, self.trainable_variables)
# Accumulate batch gradients
accum_gradient = [(acum_grad+grad) for acum_grad, grad in \
zip(self.gradient_accumulation, gradients)]
accum_gradient = [this_grad/batch_size for this_grad in accum_gradient]
# apply accumulated gradients
self.optimizer.apply_gradients(zip(accum_gradient, self.trainable_variables))
# TODO: reset self.gradient_accumulation
# update metrics
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}
请运行并检查以下玩具设置。
# Model 
size = 32
input = tf.keras.Input(shape=(size,size,3))
efnet = tf.keras.applications.DenseNet121(weights=None,
include_top = False,
input_tensor = input)
base_maps = tf.keras.layers.GlobalAveragePooling2D()(efnet.output)
base_maps = tf.keras.layers.Dense(units=10, activation='softmax',
name='primary')(base_maps)
custom_model = CustomTrainStep(n_gradients=10, inputs=[input], outputs=[base_maps])

# bind all
custom_model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = ['accuracy'],
optimizer = tf.keras.optimizers.Adam() )
# data 
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = tf.expand_dims(x_train, -1)
x_train = tf.repeat(x_train, 3, axis=-1)
x_train = tf.divide(x_train, 255)
x_train = tf.image.resize(x_train, [size,size]) # if we want to resize
y_train = tf.one_hot(y_train , depth=10)

# customized fit
custom_model.fit(x_train, y_train, batch_size=64, epochs=3, verbose = 1)

更新
我发现其他一些人也试图实现这一目标,但最终遇到了同样的问题。有人有一些解决方法, here ,但它太乱了,我认为应该有一些更好的方法。

最佳答案

是的,可以自定义 .fit()方法通过覆盖 train_step在没有自定义训练循环的情况下,下面的简单示例将向您展示如何使用 训练一个简单的 mnist 分类器。梯度累积 :

import tensorflow as tf

# overriding train step
# my attempt
# it's not appropriately implemented
# and need to fix
class CustomTrainStep(tf.keras.Model):
def __init__(self, n_gradients, *args, **kwargs):
super().__init__(*args, **kwargs)
self.n_gradients = tf.constant(n_gradients, dtype=tf.int32)
self.n_acum_step = tf.Variable(0, dtype=tf.int32, trainable=False)
self.gradient_accumulation = [tf.Variable(tf.zeros_like(v, dtype=tf.float32), trainable=False) for v in self.trainable_variables]

def train_step(self, data):
self.n_acum_step.assign_add(1)

x, y = data
# Gradient Tape
with tf.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
# Calculate batch gradients
gradients = tape.gradient(loss, self.trainable_variables)
# Accumulate batch gradients
for i in range(len(self.gradient_accumulation)):
self.gradient_accumulation[i].assign_add(gradients[i])

# If n_acum_step reach the n_gradients then we apply accumulated gradients to update the variables otherwise do nothing
tf.cond(tf.equal(self.n_acum_step, self.n_gradients), self.apply_accu_gradients, lambda: None)

# update metrics
self.compiled_metrics.update_state(y, y_pred)
return {m.name: m.result() for m in self.metrics}

def apply_accu_gradients(self):
# apply accumulated gradients
self.optimizer.apply_gradients(zip(self.gradient_accumulation, self.trainable_variables))

# reset
self.n_acum_step.assign(0)
for i in range(len(self.gradient_accumulation)):
self.gradient_accumulation[i].assign(tf.zeros_like(self.trainable_variables[i], dtype=tf.float32))

# Model
input = tf.keras.Input(shape=(28, 28))
base_maps = tf.keras.layers.Flatten(input_shape=(28, 28))(input)
base_maps = tf.keras.layers.Dense(128, activation='relu')(base_maps)
base_maps = tf.keras.layers.Dense(units=10, activation='softmax', name='primary')(base_maps)
custom_model = CustomTrainStep(n_gradients=10, inputs=[input], outputs=[base_maps])

# bind all
custom_model.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = ['accuracy'],
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3) )

# data
(x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = tf.divide(x_train, 255)
y_train = tf.one_hot(y_train , depth=10)

# customized fit
custom_model.fit(x_train, y_train, batch_size=6, epochs=3, verbose = 1)
输出:
Epoch 1/3
10000/10000 [==============================] - 13s 1ms/step - loss: 0.5053 - accuracy: 0.8584
Epoch 2/3
10000/10000 [==============================] - 13s 1ms/step - loss: 0.1389 - accuracy: 0.9600
Epoch 3/3
10000/10000 [==============================] - 13s 1ms/step - loss: 0.0898 - accuracy: 0.9748
优点:

Gradient accumulation is a mechanism to split the batch of samples —used for training a neural network — into several mini-batches ofsamples that will be run sequentially


enter image description here
因为 GA 在每个 mini-batch 之后计算损失和梯度,而不是更新模型参数,而是等待并累积连续批次的梯度,因此它可以克服内存限制,即使用较少的内存来训练模型,就像使用 large批量大小。

Example: If you run a gradient accumulation with steps of 5 and batchsize of 4 images, it serves almost the same purpose of running with abatch size of 20 images.


我们还可以在使用 GA 时并行训练,即从多台机器聚合梯度。
需要考虑的事项:
这个技术效果很好所以被广泛使用,在使用之前几乎没有什么需要考虑的事情,我认为它不应该被称为缺点,毕竟GA所做的只是转动 4 + 42 + 2 + 2 + 2 .
如果你的机器内存足够大的batch size就不用了,因为众所周知batch size过大泛化能力差,如果用GA肯定会跑得慢以达到您的机器内存已经可以处理的相同批量大小。
引用:
What is Gradient Accumulation in Deep Learning?

关于python - 在 TF.Keras 中使用自定义模型进行梯度累积?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66472201/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com