gpt4 book ai didi

pytorch - 预训练变压器模型的配置更改

转载 作者:行者123 更新时间:2023-12-04 09:35:43 25 4
gpt4 key购买 nike

我正在尝试为重整变压器实现分类头。分类头工作正常,但是当我尝试更改配置参数之一 - config.axis_pos_shape 即模型的序列长度参数时,它会引发错误;

size mismatch for reformer.embeddings.position_embeddings.weights.0: copying a param with shape torch.Size([512, 1, 64]) from checkpoint, the shape in current model is torch.Size([64, 1, 64]).size mismatch for reformer.embeddings.position_embeddings.weights.1: copying a param with shape torch.Size([1, 1024, 192]) from checkpoint, the shape in current model is torch.Size([1, 128, 192]).


配置:
{
"architectures": [
"ReformerForSequenceClassification"
],
"attention_head_size": 64,
"attention_probs_dropout_prob": 0.1,
"attn_layers": [
"local",
"lsh",
"local",
"lsh",
"local",
"lsh"
],
"axial_norm_std": 1.0,
"axial_pos_embds": true,
"axial_pos_embds_dim": [
64,
192
],
"axial_pos_shape": [
64,
256
],
"chunk_size_feed_forward": 0,
"chunk_size_lm_head": 0,
"eos_token_id": 2,
"feed_forward_size": 512,
"hash_seed": null,
"hidden_act": "relu",
"hidden_dropout_prob": 0.05,
"hidden_size": 256,
"initializer_range": 0.02,
"intermediate_size": 3072,
"is_decoder": true,
"layer_norm_eps": 1e-12,
"local_attention_probs_dropout_prob": 0.05,
"local_attn_chunk_length": 64,
"local_num_chunks_after": 0,
"local_num_chunks_before": 1,
"lsh_attention_probs_dropout_prob": 0.0,
"lsh_attn_chunk_length": 64,
"lsh_num_chunks_after": 0,
"lsh_num_chunks_before": 1,
"max_position_embeddings": 8192,
"model_type": "reformer",
"num_attention_heads": 2,
"num_buckets": [
64,
128
],
"num_chunks_after": 0,
"num_chunks_before": 1,
"num_hashes": 1,
"num_hidden_layers": 6,
"output_past": true,
"pad_token_id": 0,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 100
}
},
"vocab_size": 320
}
python 代码:
config = ReformerConfig()
config.max_position_embeddings = 8192
config.axial_pos_shape=[64, 128]

#config = ReformerConfig.from_pretrained('./cnp/config.json', output_attention=True)

model = ReformerForSequenceClassification(config)
model.load_state_dict(torch.load("./cnp/pytorch_model.bin"))

最佳答案

我遇到了同样的问题,试图将 65536 (128*512) 的大小减半,默认情况下,在改革者预训练中使用的最大序列长度。
正如@cronoik 提到的,您必须:

  • 加载预训练的改革者
  • 通过删除不必要的权重来根据您的需要调整大小
  • 保存这个新模型
  • 加载这个新模型以执行您想要的任务

  • 那些不必要的权重来自 Position Embeddings 层。在改革者模型中,轴向位置编码策略用于学习位置嵌入(而不是像 BERT 这样的固定嵌入)。 Axial Position Encodings 使用两个小张量而不是一个大张量,以一种内存高效的方式存储位置嵌入。
    然而,位置嵌入的思想仍然完全相同,即为每个位置获得不同的嵌入。
    也就是说,理论上(如果我在某处误解了请纠正我),删除最后一个位置嵌入以匹配您的自定义最大序列长度不应损害性能。你可以引用这个 post from HuggingFace查看轴向位置编码的更详细描述,并了解在何处截断位置嵌入张量。
    我已经设法调整大小并使用以下代码使用自定义最大长度为 32768 (128*256) 的改革者:
    # Load intial pretrained model
    model = ReformerForSequenceClassification.from_pretrained('google/reformer-enwik8', num_labels=2)

    # Reshape Axial Position Embeddings layer to match desired max seq length
    model.reformer.embeddings.position_embeddings.weights[1] = torch.nn.Parameter(model.reformer.embeddings.position_embeddings.weights[1][0][:256])

    # Update the config file to match custom max seq length
    model.config.axial_pos_shape = 128, 256
    model.config.max_position_embeddings = 128*256 # 32768

    # Save model with custom max length
    output_model_path = "path/to/model"
    model.save_pretrained(output_model_path)

    关于pytorch - 预训练变压器模型的配置更改,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62603089/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com