gpt4 book ai didi

python - 尝试微调 NER 的 ReformerModelWithLMHead (google/reformer-enwik8) 时出错

转载 作者:行者123 更新时间:2023-12-04 11:34:51 33 4
gpt4 key购买 nike

我正在尝试为 NER 微调 ReformerModelWithLMHead (google/reformer-enwik8)。我使用了与编码方法相同的填充序列长度 (max_length = max([len(string) for string in list_of_strings])) 以及 attention_masks。我收到了这个错误:
ValueError:如果训练,请确保 config.axis_pos_shape 因子:(128, 512) 乘以序列长度。得到 prod((128, 512)) != sequence_length: 2248. 您可能需要考虑将序列长度填充为 65536 或更改 config.axial_pos_shape。

  • 当我将序列长度更改为 65536 时,我的 colab session 因获取 65536 长度的所有输入而崩溃。
  • 根据第二个选项(更改 config.axial_pos_shape),我无法更改它。

  • 我想知道,是否有机会在微调模型的同时更改 config.axis_pos_shape?或者我在对reformer-enwik8的输入字符串进行编码时遗漏了什么?
    谢谢!
    问题更新:我尝试了以下方法:
  • 通过在模型实例化时提供参数:

  • model = transformers.ReformerModelWithLMHead.from_pretrained("google/reformer-enwik8", num_labels=9, max_position_embeddings=1024, axial_pos_shape=[16,64], axial_pos_embds_dim=[32,96],hidden_size=128)


    它给了我以下错误:

    RuntimeError: Error(s) in loading state_dict for ReformerModelWithLMHead:size mismatch for reformer.embeddings.word_embeddings.weight: copying a param with shape torch.Size([258, 1024]) from checkpoint, the shape in current model is torch.Size([258, 128]).size mismatch for reformer.embeddings.position_embeddings.weights.0: copying a param with shape torch.Size([128, 1, 256]) from checkpoint, the shape in current model is torch.Size([16, 1, 32]).


    这是一个相当长的错误。
  • 然后我尝试使用此代码更新配置:

  • model1 = transformers.ReformerModelWithLMHead.from_pretrained('google/reformer-enwik8', num_labels = 9)


    重塑轴向位置嵌入层以匹配所需的最大序列长度
    model1.reformer.embeddings.position_embeddings.weights[1] = torch.nn.Parameter(model1.reformer.embeddings.position_embeddings.weights[1][0][:128])
    更新配置文件以匹配自定义最大序列长度
    model1.config.axial_pos_shape = 16,128
    model1.config.max_position_embeddings = 16*128 #2048
    model1.config.axial_pos_embds_dim= 32,96
    model1.config.hidden_size = 128
    output_model_path = "model"
    model1.save_pretrained(output_model_path)
    通过此实现,我收到此错误:

    RuntimeError: The expanded size of the tensor (512) must match the existing size (128) at non-singleton dimension 2. Target sizes: [1, 128, 512, 768]. Tensor sizes: [128, 768]


    因为更新后的大小/形状与预训练模型的原始配置参数不匹配。原始参数为:axis_pos_shape = 128,512 max_position_embeddings = 128*512 #65536axis_pos_embds_dim= 256,768 hidden_​​size = 1024
    这是我更改配置参数的正确方法还是我必须做其他事情?
    是否有任何示例对 ReformerModelWithLMHead('google/reformer-enwik8') 模型进行了微调。
    我的主要代码实现如下:
    class REFORMER(torch.nn.Module):
    def __init__(self):
    super(REFORMER, self).__init__()
    self.l1 = transformers.ReformerModelWithLMHead.from_pretrained("google/reformer-enwik8", num_labels=9)

    def forward(self, input_ids, attention_masks, labels):
    output_1= self.l1(input_ids, attention_masks, labels = labels)
    return output_1


    model = REFORMER()

    def train(epoch):
    model.train()
    for _, data in enumerate(training_loader,0):
    ids = data['input_ids'][0] # input_ids from encode method of the model https://huggingface.co/google/reformer-enwik8#:~:text=import%20torch%0A%0A%23%20Encoding-,def%20encode,-(list_of_strings%2C%20pad_token_id%3D0
    input_shape = ids.size()
    targets = data['tags']
    print("tags: ", targets, targets.size())
    least_common_mult_chunk_length = 65536
    padding_length = least_common_mult_chunk_length - input_shape[-1] % least_common_mult_chunk_length
    #pad input
    input_ids, inputs_embeds, attention_mask, position_ids, input_shape = _pad_to_mult_of_chunk_length(self=model.l1,
    input_ids=ids,
    inputs_embeds=None,
    attention_mask=None,
    position_ids=None,
    input_shape=input_shape,
    padding_length=padding_length,
    padded_seq_length=None,
    device=None,
    )
    outputs = model(input_ids, attention_mask, labels=targets) # sending inputs to the forward method
    print(outputs)
    loss = outputs.loss
    logits = outputs.logits
    if _%500==0:
    print(f'Epoch: {epoch}, Loss: {loss}')

    for epoch in range(1):
    train(epoch)

    最佳答案

    改革者模型在论文改革者中提出:The Efficient Transformer作者:Nikita Kitaev、Łukasz Kaiser、Anselm Levskaya。
    该论文包含一种分解巨大矩阵的方法,该方法是处理非常长的序列的结果!这种分解依赖于 2 个假设

  • 参数 config.axial_pos_embds_dim设置为元组 (d1,d2)哪个总和必须等于 config.hidden_​​size
  • config.axial_pos_shape设置为元组 (n1s,n2s)哪个产品必须等于 config.max_embedding_size
    (更多关于这些 here !)

  • 最后你的问题;)
  • 我几乎可以肯定您的 session 因 ram 溢出而崩溃
  • 您可以在模型实例化期间更改任何配置参数,例如
    official documentation !
  • 关于python - 尝试微调 NER 的 ReformerModelWithLMHead (google/reformer-enwik8) 时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68742863/

    33 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com