gpt4 book ai didi

python-3.x - 如何使用 SWA :stochastic weights average? 添加 BatchNormalization

转载 作者:行者123 更新时间:2023-12-02 00:51:14 28 4
gpt4 key购买 nike

我是 Deepleaning 和 Pytorch 的初学者。

我不明白如何在使用 SWA 时使用 BatchNormalization。

pytorch.org 在 https://pytorch.org/blog/stochastic-weight-averaging-in-pytorch/ 中说:

Note that the SWA averages of the weights are never used to make predictions during training, and so the batch normalization layers do not have the activation statistics computed after you reset the weights of your model with opt.swap_swa_sgd()

这意味着它适合在使用SWA之后添加BatchNormalization层吗?

# it means, in my idea

#for example

opt = torchcontrib.optim.SWA(base_opt)
for i in range(100):
opt.zero_grad()
loss_fn(model(input), target).backward()
opt.step()
if i > 10 and i % 5 == 0:
opt.update_swa()
opt.swap_swa_sgd()


#save model once
torch.save(model,"swa_model.pt")

#model_load
saved_model=torch.load("swa_model.pt")

#it means adding BatchNormalization layer??
model2=saved_model
model2.add_module("Batch1",nn.BatchNorm1d(10))

# decay learning_rate more
learning_rate=0.005
optimizer = torch.optim.SGD(model2.parameters(), lr=learning_rate)

# train model again
for epoch in range(num_epochs):
loss = train(train_loader)
val_loss, val_acc = valid(test_loader)

非常感谢您的回复。

听从您的建议,

我尝试制作添加 optimizer.bn_update() 的示例模型

# add  optimizer.bn_update() to model

criterion = nn.CrossEntropyLoss()
learning_rate=0.01

base_opt = torch.optim.SGD(model.parameters(), lr=0.1)
optimizer = SWA(base_opt, swa_start=10, swa_freq=5, swa_lr=0.05)

def train(train_loader):
#mode:train
model.train()
running_loss = 0
for batch_idx, (images, labels) in enumerate(train_loader):
optimizer.zero_grad()
outputs = model(images)
#loss
loss = criterion(outputs, labels)
running_loss += loss.item()
loss.backward()
optimizer.step()

optimizer.swap_swa_sgd()
train_loss = running_loss / len(train_loader)

return train_loss


def valid(test_loader):

model.eval()
running_loss = 0
correct = 0
total = 0
#torch.no_grad
with torch.no_grad():
for batch_idx, (images, labels) in enumerate(test_loader):
outputs = model(images)

loss = criterion(outputs, labels)
running_loss += loss.item()

_, predicted = torch.max(outputs, 1)
correct += (predicted == labels).sum().item()
total += labels.size(0)

val_loss = running_loss / len(test_loader)
val_acc = float(correct) / total

return val_loss, val_acc



num_epochs=30

loss_list = []
val_loss_list = []
val_acc_list = []
for epoch in range(num_epochs):
loss = train(train_loader)
val_loss, val_acc = valid(test_loader)
optimizer.bn_update(train_loader, model)
print('epoch %d, loss: %.4f val_loss: %.4f val_acc: %.4f'
% (epoch, loss, val_loss, val_acc))

# logging
loss_list.append(loss)
val_loss_list.append(val_loss)
val_acc_list.append(val_acc)

# optimizer.bn_updata()
optimizer.bn_update(train_loader, model)

# go on evaluating model,,,

最佳答案

文档告诉您的是,由于 SWA 计算权重的平均值,但这些权重在训练期间不用于预测,因此批量归一化层不会看到这些权重。这意味着他们没有为他们计算各自的统计数据(因为他们永远无法计算)这很重要,因为权重是在实际预测期间使用的(即不在训练期间)。

这意味着,他们假定您的模型中有批量归一化层,并希望使用 SWA 对其进行训练。由于上述原因,这(或多或少)不是直截了当的。 p>

一种方法如下:

To compute the activation statistics you can just make a forward pass on your training data using the SWA model once the training is finished.

或者你可以使用他们的帮助类:

In the SWA class we provide a helper function opt.bn_update(train_loader, model). It updates the activation statistics for every batch normalization layer in the model by making a forward pass on the train_loader data loader. You only need to call this function once in the end of training.

如果您使用的是 Pytorch 的 DataLoader class您可以简单地将模型(训练后)和训练加载器提供给 bn_update 函数,该函数会为您更新所有批量归一化统计信息。该函数只需要在训练结束时调用一次即可。


进行的步骤:

  1. 使用 SWA 训练包含批量归一化层的模型
  2. 在您的模型完成训练后,使用您的训练数据调用opt.bn_update(train_loader, model)并提供您的训练模型

关于python-3.x - 如何使用 SWA :stochastic weights average? 添加 BatchNormalization,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57406061/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com