gpt4 book ai didi

pytorch - Huggingface bert 表现出较差的准确性/f1 分数 [pytorch]

转载 作者:行者123 更新时间:2023-12-03 23:47:11 26 4
gpt4 key购买 nike

我正在尝试 BertForSequenceClassification用于简单的文章分类任务。

无论我如何训练(卡住除分类层之外的所有层、所有可训练层、最后 k 层可训练),我总是得到几乎随机的准确度分数。我的模型没有超过 24-26% 的训练准确度(我的数据集中只有 5 个类)。

我不确定在设计/训练模型时我做错了什么。我尝试了多个数据集的模型,每次它都给出相同的随机基线精度。

我使用的数据集:BBC 文章(5 类)

https://github.com/zabir-nabil/pytorch-nlp/tree/master/bbc

Consists of 2225 documents from the BBC news website corresponding to stories in five topical areas from 2004-2005. Natural Classes: 5 (business, entertainment, politics, sport, tech)



我添加了模型部分和训练部分,这是最重要的部分(以避免任何不相关的细节)。如果这对重现性有用,我也添加了完整的源代码 + 数据。

我的猜测是我设计网络的方式或我将 attention_masks/标签传递给模型的方式有问题。此外, token 长度 512 应该不是问题,因为大多数文本的长度 < 512(平均长度 < 300)。

型号代码:

import torch
from torch import nn

class BertClassifier(nn.Module):
def __init__(self):
super(BertClassifier, self).__init__()
self.bert = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = 5)
# as we have 5 classes

# we want our output as probability so, in the evaluation mode, we'll pass the logits to a softmax layer
self.softmax = torch.nn.Softmax(dim = 1) # last dimension
def forward(self, x, attn_mask = None, labels = None):

if self.training == True:
# print(x.shape)
loss = self.bert(x, attention_mask = attn_mask, labels = labels)
# print(x[0].shape)

return loss

if self.training == False: # in evaluation mode
x = self.bert(x)
x = self.softmax(x[0])

return x
def freeze_layers(self, last_trainable = 1):
# we freeze all the layers except the last classification layer + few transformer blocks
for layer in list(self.bert.parameters())[:-last_trainable]:
layer.requires_grad = False


# create our model

bertclassifier = BertClassifier()

培训代码:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # cuda for gpu acceleration

# optimizer

optimizer = torch.optim.Adam(bertclassifier.parameters(), lr=0.001)


epochs = 15

bertclassifier.to(device) # taking the model to GPU if possible

# metrics

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

train_losses = []

train_metrics = {'acc': [], 'f1': []}
test_metrics = {'acc': [], 'f1': []}

# progress bar

from tqdm import tqdm_notebook

for e in tqdm_notebook(range(epochs)):
train_loss = 0.0
train_acc = 0.0
train_f1 = 0.0
batch_cnt = 0

bertclassifier.train()

print(f'epoch: {e+1}')

for i_batch, (X, X_mask, y) in tqdm_notebook(enumerate(bbc_dataloader_train)):
X = X.to(device)
X_mask = X_mask.to(device)
y = y.to(device)


optimizer.zero_grad()

loss, y_pred = bertclassifier(X, X_mask, y)

train_loss += loss.item()
loss.backward()
optimizer.step()

y_pred = torch.argmax(y_pred, dim = -1)

# update metrics
train_acc += accuracy_score(y.cpu().detach().numpy(), y_pred.cpu().detach().numpy())
train_f1 += f1_score(y.cpu().detach().numpy(), y_pred.cpu().detach().numpy(), average = 'micro')
batch_cnt += 1

print(f'train loss: {train_loss/batch_cnt}')
train_losses.append(train_loss/batch_cnt)
train_metrics['acc'].append(train_acc/batch_cnt)
train_metrics['f1'].append(train_f1/batch_cnt)


test_loss = 0.0
test_acc = 0.0
test_f1 = 0.0
batch_cnt = 0

bertclassifier.eval()
with torch.no_grad():
for i_batch, (X, y) in enumerate(bbc_dataloader_test):
X = X.to(device)
y = y.to(device)

y_pred = bertclassifier(X) # in eval model we get the softmax output so, don't need to index


y_pred = torch.argmax(y_pred, dim = -1)

# update metrics
test_acc += accuracy_score(y.cpu().detach().numpy(), y_pred.cpu().detach().numpy())
test_f1 += f1_score(y.cpu().detach().numpy(), y_pred.cpu().detach().numpy(), average = 'micro')
batch_cnt += 1

test_metrics['acc'].append(test_acc/batch_cnt)
test_metrics['f1'].append(test_f1/batch_cnt)

数据集的完整源代码可在此处获得: https://github.com/zabir-nabil/pytorch-nlp/blob/master/bert-article-classification.ipynb

更新:

观察预测后,似乎模型几乎总是预测为 0:
bertclassifier.eval()
with torch.no_grad():
for i_batch, (X, y) in enumerate(bbc_dataloader_test):
X = X.to(device)
y = y.to(device)

y_pred = bertclassifier(X) # in eval model we get the softmax output so, don't need to index


y_pred = torch.argmax(y_pred, dim = -1)

print(y)
print(y_pred)
print('--------------------')
tensor([4, 2, 2, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 0, 3, 1], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 0, 0, 2], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 4, 4, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([4, 3, 2, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 3, 3, 1], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 1, 4, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 0, 0, 1], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 3, 1, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 2, 4, 1], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 3, 1, 1], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 0, 1, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 0, 1, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([4, 3, 1, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 2, 0, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 1, 2, 2], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 4, 3, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 3, 0, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 3, 0, 1], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 3, 2, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 3, 1, 2], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 2, 3, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([4, 3, 3, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 4, 2, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 4, 4, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 1, 3, 2], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 3, 2, 1], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 0, 0, 1], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([4, 1, 4, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 4, 3, 2], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 2, 1, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 3, 3, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 4, 0, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 1, 1, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([4, 2, 4, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 3, 0, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 2, 3, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 3, 0, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 3, 1, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 2, 2, 1], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 3, 2, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 3, 2, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 3, 0, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 1, 3, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 4, 0, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 3, 0, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([4, 3, 3, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 2, 0, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 0, 0, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 0, 2, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 2, 3, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 2, 3, 2], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 3, 0, 2], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 0, 0, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 0, 2, 2], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 4, 3, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([4, 0, 4, 2], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 0, 3, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([4, 2, 0, 1], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 3, 1, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 1, 3, 1], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 3, 3, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 3, 0, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 2, 3, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 0, 0, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([4, 0, 3, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 1, 1, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 1, 0, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 4, 1, 2], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 3, 2, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 3, 4, 1], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([3, 0, 4, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 1, 3, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([4, 4, 3, 1], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 0, 3, 2], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 3, 3, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([4, 0, 3, 4], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 0, 1, 2], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([1, 2, 3, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([2, 0, 4, 2], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([4, 2, 4, 0], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
tensor([0, 0, 3, 3], device='cuda:0')
tensor([0, 0, 0, 0], device='cuda:0')
--------------------
...
...

实际上,模型总是预测相同的输出 [0.2270, 0.1855, 0.2131, 0.1877, 0.1867]对于任何输入,就好像它根本没有学到任何东西。

这很奇怪,因为我的数据集没有不平衡。
Counter({'politics': 417,
'business': 510,
'entertainment': 386,
'tech': 401,
'sport': 511})

最佳答案

经过一番挖掘,我发现,主要的罪魁祸首是学习率,用于微调 bert 0.001非常高。当我从 0.001 降低我的学习率时至 1e-5 ,我的训练和测试准确率都达到了 95%。

When BERT is fine-tuned, all layers are trained - this is quite different from fine-tuning in a lot of other ML models, but it matches what was described in the paper and works quite well (as long as you only fine-tune for a few epochs - it's very easy to overfit if you fine-tune the whole model for a long time on a small amount of data!)



源代码: https://github.com/huggingface/transformers/issues/587

Best result is found when all the layers are trained with a really small learning rate.



源代码: https://github.com/uzaymacar/comparatively-finetuning-bert

关于pytorch - Huggingface bert 表现出较差的准确性/f1 分数 [pytorch],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61969783/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com