gpt4 book ai didi

python - TensorFlow 2.0 : display progress bar in custom training loop

转载 作者:行者123 更新时间:2023-12-04 11:51:03 27 4
gpt4 key购买 nike

我正在为音频分类任务训练 CNN,并且我正在使用带有自定义训练循环的 TensorFlow 2.0 RC(如其官方网站的 this guide 中所述)。我会发现有一个不错的进度条真的很方便,类似于通常的 Keras model.fit .

这是我的训练代码的大纲(我使用了 4 个 GPU,采用镜像分布策略):

strategy = distribute.MirroredStrategy()

distr_train_dataset = strategy.experimental_distribute_dataset(train_dataset)

if valid_dataset:
distr_valid_dataset = strategy.experimental_distribute_dataset(valid_dataset)

with strategy.scope():

model = build_model() # build the model

optimizer = # define optimizer
train_loss = # define training loss
train_metrics_1 = # AUC-ROC
train_metrics_2 = # AUC-PR
valid_metrics_1 = # AUC-ROC for validation
valid_metrics_2 = # AUC-PR for validation

# rescale loss
def compute_loss(labels, predictions):
per_example_loss = train_loss(labels, predictions)
return per_example_loss/config.batch_size

def train_step(batch):
audio_batch, label_batch = batch
with tf.GradientTape() as tape:
logits = model(audio_batch)
loss = compute_loss(label_batch, logits)
variables = model.trainable_variables
grads = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(grads, variables))

train_metrics_1.update_state(label_batch, logits)
train_metrics_2.update_state(label_batch, logits)
train_mean_loss.update_state(loss)
return loss

def valid_step(batch):
audio_batch, label_batch = batch
logits = model(audio_batch, training=False)
loss = compute_loss(label_batch, logits)

val_metrics_1.update_state(label_batch, logits)
val_metrics_2.update_state(label_batch, logits)
val_loss.update_state(loss)
return loss

@tf.function
def distributed_train(batch):
num_batches = 0
for batch in distr_train_dataset:
num_batches += 1
strategy.experimental_run_v2(train_step, args=(batch, ))
# print progress here
tf.print('Step', num_batches, '; Loss', train_mean_loss.result(), '; ROC_AUC', train_metrics_1.result(), '; PR_AUC', train_metrics_2.result())
gc.collect()

@tf.function
def distributed_valid(batch):
for batch in distr_valid_dataset:
strategy.experimental_run_v2(valid_step, args=(batch, ))
gc.collect()

for epoch in range(epochs):
distributed_train(distr_train_dataset)
gc.collect()
train_metrics_1.reset_states()
train_metrics_2.reset_states()
train_mean_loss.reset_states()

if valid_dataset:
distributed_valid(distr_valid_dataset)
gc.collect()
val_metrics_1.reset_states()
val_metrics_2.reset_states()
val_loss.reset_states()

这里 train_datasetvalid_dataset是使用通常的 tf.data 输入管道生成的两个 tf.data.TFRecordDataset。

TensorFlow 提供了一个非常好的 tf.keras.utils.Progbar(这确实是您使用 model.fit 训练时看到的)。我看过它的 source code ,它依赖于 numpy,所以我不能用它代替 tf.print()语句(以图形模式执行)。

如何在我的自定义训练循环中实现类似的进度条(我的训练函数在图形模式下运行)?

怎么样 model.fit首先显示进度条?

最佳答案

可以使用以下代码生成自定义训练循环的进度条:

from tensorflow.keras.utils import Progbar
import time
import numpy as np

metrics_names = ['acc','pr']

num_epochs = 5
num_training_samples = 100
batch_size = 10

for i in range(num_epochs):
print("\nepoch {}/{}".format(i+1,num_epochs))

pb_i = Progbar(num_training_samples, stateful_metrics=metrics_names)

for j in range(num_training_samples//batch_size):

time.sleep(0.3)

values=[('acc',np.random.random(1)), ('pr',np.random.random(1))]

pb_i.add(batch_size, values=values)
输出:
epoch 1/5

100/100 [==============================] - 3s 30ms/step - acc: 0.2169 - pr: 0.9011

epoch 2/5

100/100 [==============================] - 3s 30ms/step - acc: 0.7815 - pr: 0.4900

epoch 3/5

100/100 [==============================] - 3s 30ms/step - acc: 0.8003 - pr: 0.9292

epoch 4/5

100/100 [==============================] - 3s 30ms/step - acc: 0.8280 - pr: 0.9113

epoch 5/5

100/100 [==============================] - 3s 30ms/step - acc: 0.8497 - pr: 0.1929

关于python - TensorFlow 2.0 : display progress bar in custom training loop,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57971007/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com