gpt4 book ai didi

machine-learning - BERT HuggingFace 给出 NaN 损失

转载 作者:行者123 更新时间:2023-12-01 23:57:39 31 4
gpt4 key购买 nike

我正在尝试针对文本分类任务微调 BERT,但我得到了 NaN 损失并且无法弄清楚原因。

首先,我定义了一个 BERT 分词器,然后对我的文本进行分词:

from transformers import DistilBertTokenizer, RobertaTokenizer
distil_bert = 'distilbert-base-uncased'

tokenizer = DistilBertTokenizer.from_pretrained(distil_bert, do_lower_case=True, add_special_tokens=True,
max_length=128, pad_to_max_length=True)

def tokenize(sentences, tokenizer):
input_ids, input_masks, input_segments = [],[],[]
for sentence in tqdm(sentences):
inputs = tokenizer.encode_plus(sentence, add_special_tokens=True, max_length=25, pad_to_max_length=True,
return_attention_mask=True, return_token_type_ids=True)
input_ids.append(inputs['input_ids'])
input_masks.append(inputs['attention_mask'])
input_segments.append(inputs['token_type_ids'])

return np.asarray(input_ids, dtype='int32'), np.asarray(input_masks, dtype='int32'), np.asarray(input_segments, dtype='int32')

train = pd.read_csv('train_dataset.csv')
d = train['text']
input_ids, input_masks, input_segments = tokenize(d, tokenizer)

接下来,我加载我的整数标签,它们是:0、1、2、3。

d_y = train['label']
0 0
1 1
2 0
3 2
4 0
5 0
6 0
7 0
8 3
9 1
Name: label, dtype: int64

然后我加载预训练的 Transformer 模型并在其上放置层。我在编译模型时使用了 SparseCategoricalCrossEntropy Loss:

from transformers import TFDistilBertForSequenceClassification, DistilBertConfig, AutoTokenizer, TFDistilBertModel

distil_bert = 'distilbert-base-uncased'
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.0000001)

config = DistilBertConfig(num_labels=4, dropout=0.2, attention_dropout=0.2)
config.output_hidden_states = False
transformer_model = TFDistilBertModel.from_pretrained(distil_bert, config = config)

input_ids_in = tf.keras.layers.Input(shape=(25,), name='input_token', dtype='int32')
input_masks_in = tf.keras.layers.Input(shape=(25,), name='masked_token', dtype='int32')

embedding_layer = transformer_model(input_ids_in, attention_mask=input_masks_in)[0]
X = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(50, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(embedding_layer)
X = tf.keras.layers.GlobalMaxPool1D()(X)
X = tf.keras.layers.Dense(50, activation='relu')(X)
X = tf.keras.layers.Dropout(0.2)(X)
X = tf.keras.layers.Dense(4, activation='softmax')(X)
model = tf.keras.Model(inputs=[input_ids_in, input_masks_in], outputs = X)

for layer in model.layers[:3]:
layer.trainable = False

model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'],
)

最后,我使用之前标记化的 input_ids 和 input_masks 作为模型的输入来运行模型,并在第一个时期后获得 NAN 损失:

model.fit(x=[input_ids, input_masks], y = d_y, epochs=3)

Epoch 1/3
20/20 [==============================] - 4s 182ms/step - loss: 0.9714 - sparse_categorical_accuracy: 0.6153
Epoch 2/3
20/20 [==============================] - 0s 19ms/step - loss: nan - sparse_categorical_accuracy: 0.5714
Epoch 3/3
20/20 [==============================] - 0s 20ms/step - loss: nan - sparse_categorical_accuracy: 0.5714
<tensorflow.python.keras.callbacks.History at 0x7fee0e220f60>

编辑:模型在第一个时期计算损失,但它开始返回 NaN在第二个时代。是什么导致了这个问题???

有人知道我做错了什么吗?欢迎所有建议!

最佳答案

问题出在这里:

X = tf.keras.layers.Dense(1, activation='softmax')(X)

在网络的末端,您只有一个神经元,对应于一个类。类别 0 的输出概率始终为 100%。如果您有类别 0、1、2、3,则最后需要有 4 个输出

关于machine-learning - BERT HuggingFace 给出 NaN 损失,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62436178/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com