gpt4 book ai didi

python - BERT 微调后得到句子级别的嵌入

转载 作者:行者123 更新时间:2023-12-03 21:15:48 26 4
gpt4 key购买 nike

我遇到了这个page

1)我想在微调完成后获得句子级嵌入(由[CLS] token 给出的嵌入)。我怎么能做到?

2)我还注意到该页面上的代码需要花费大量时间才能返回测试数据的结果。这是为什么?当我训练模型时,与我尝试获得测试预测时相比,它花费的时间更少。
从该页面上的代码中,我没有使用下面的代码块

test_InputExamples = test.apply(lambda x: bert.run_classifier.InputExample(guid=None, 
text_a = x[DATA_COLUMN],
text_b = None,
label = x[LABEL_COLUMN]), axis = 1

test_features = bert.run_classifier.convert_examples_to_features(test_InputExamples, label_list, MAX_SEQ_LENGTH, tokenizer)

test_input_fn = run_classifier.input_fn_builder(
features=test_features,
seq_length=MAX_SEQ_LENGTH,
is_training=False,
drop_remainder=False)

estimator.evaluate(input_fn=test_input_fn, steps=None)

相反,我只是在我的整个测试数据上使用了下面的函数
def getPrediction(in_sentences):
labels = ["Negative", "Positive"]
input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, label = 0) for x in in_sentences] # here, "" is just a dummy label
input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
predictions = estimator.predict(predict_input_fn)
return [(sentence, prediction['probabilities'], labels[prediction['labels']]) for sentence, prediction in zip(in_sentences, predictions)]

3)我怎么能得到预测的概率。有没有办法使用 keras predict方法?

更新1

问题2更新 -
你能用 getPrediction 测试 20000 个训练样例吗?功能?....对我来说需要更长的时间..甚至比在 20000 个示例上训练模型所花费的时间还要长。

最佳答案

1) 来自 BERT documentation

The output dictionary contains:

pooled_output: pooled output of the entire sequence with shape [batch_size, hidden_size]. sequence_output: representations of every token in the input sequence with shape [batch_size, max_sequence_length, hidden_size].



我已添加 pooled_output对应于 CLS 向量的向量。

3)您收到日志概率。只需申请 softmax得到正常的概率。

现在剩下要做的就是让模型报告它。我已经留下了日志问题,但它们不再需要了。

查看代码更改:
def create_model(is_predicting, input_ids, input_mask, segment_ids, labels,
num_labels):
"""Creates a classification model."""

bert_module = hub.Module(
BERT_MODEL_HUB,
trainable=True)
bert_inputs = dict(
input_ids=input_ids,
input_mask=input_mask,
segment_ids=segment_ids)
bert_outputs = bert_module(
inputs=bert_inputs,
signature="tokens",
as_dict=True)

# Use "pooled_output" for classification tasks on an entire sentence.
# Use "sequence_outputs" for token-level output.
output_layer = bert_outputs["pooled_output"]

pooled_output = output_layer

hidden_size = output_layer.shape[-1].value

# Create our own layer to tune for politeness data.
output_weights = tf.get_variable(
"output_weights", [num_labels, hidden_size],
initializer=tf.truncated_normal_initializer(stddev=0.02))

output_bias = tf.get_variable(
"output_bias", [num_labels], initializer=tf.zeros_initializer())

with tf.variable_scope("loss"):

# Dropout helps prevent overfitting
output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

logits = tf.matmul(output_layer, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
log_probs = tf.nn.log_softmax(logits, axis=-1)
probs = tf.nn.softmax(logits, axis=-1)

# Convert labels into one-hot encoding
one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

predicted_labels = tf.squeeze(tf.argmax(log_probs, axis=-1, output_type=tf.int32))
# If we're predicting, we want predicted labels and the probabiltiies.
if is_predicting:
return (predicted_labels, log_probs, probs, pooled_output)

# If we're train/eval, compute loss between predicted and actual label
per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
loss = tf.reduce_mean(per_example_loss)
return (loss, predicted_labels, log_probs, probs, pooled_output)

现在在 model_fn_builder()添加对这些值的支持:
  # this should be changed in both places
(predicted_labels, log_probs, probs, pooled_output) = create_model(
is_predicting, input_ids, input_mask, segment_ids, label_ids, num_labels)

# return dictionary of all the values you wanted
predictions = {
'log_probabilities': log_probs,
'probabilities': probs,
'labels': predicted_labels,
'pooled_output': pooled_output
}

调整 getPrediction()因此,最终您的预测将如下所示:
('That movie was absolutely awful',
array([0.99599314, 0.00400678], dtype=float32), <= Probability
array([-4.0148855e-03, -5.5197663e+00], dtype=float32), <= Log probability, same as previously
'Negative', <= Label
array([ 0.9181199 , 0.7763732 , 0.9999883 , -0.93533266, -0.9841384 ,
0.78126144, -0.9918988 , -0.18764131, 0.9981035 , 0.99999994,
0.900716 , -0.99926263, -0.5078789 , -0.99417543, -0.07695035,
0.9501321 , 0.75836045, 0.49151263, -0.7886792 , 0.97505844,
-0.8931161 , -1. , 0.9318583 , -0.60531116, -0.8644371 ,
...
and this is 768-d [CLS] vector (sentence embedding).

关于 2):最后训练大约需要 5 分钟,测试大约需要 40 秒。很合理。

更新

对于 20k 个样本,训练时间为 12:48,测试时间为 2:07。

对于 10k 个样本,时间分别为 8:40 和 1:07。

关于python - BERT 微调后得到句子级别的嵌入,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60767089/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com