gpt4 book ai didi

huggingface-transformers - 使用 Huggingface Transformers 从磁盘加载预训练模型

转载 作者:行者123 更新时间:2023-12-04 11:03:13 28 4
gpt4 key购买 nike

来自文档 for from_pretrained ,我知道我不必每次都下载预训练的向量,我可以使用以下语法保存它们并从磁盘加载:

  - a path to a `directory` containing vocabulary files required by the tokenizer, for instance saved using the :func:`~transformers.PreTrainedTokenizer.save_pretrained` method, e.g.: ``./my_model_directory/``.
- (not applicable to all derived classes, deprecated) a path or url to a single saved vocabulary file if and only if the tokenizer only requires a single vocabulary file (e.g. Bert, XLNet), e.g.: ``./my_model_directory/vocab.txt``.
所以,我去了模型中心:
  • https://huggingface.co/models

  • 我找到了我想要的模型:
  • https://huggingface.co/bert-base-cased

  • 我从他们提供给这个存储库的链接下载了它:

    Pretrained model on English language using a masked language modeling(MLM) objective. It was introduced in this paper and first released inthis repository. This model is case-sensitive: it makes a differencebetween english and English.


    存储在:
      /my/local/models/cased_L-12_H-768_A-12/
    其中包含:
     ./
    ../
    bert_config.json
    bert_model.ckpt.data-00000-of-00001
    bert_model.ckpt.index
    bert_model.ckpt.meta
    vocab.txt
    所以,现在我有以下几点:
      PATH = '/my/local/models/cased_L-12_H-768_A-12/'
    tokenizer = BertTokenizer.from_pretrained(PATH, local_files_only=True)
    我得到这个错误:
    >           raise EnvironmentError(msg)
    E OSError: Can't load config for '/my/local/models/cased_L-12_H-768_A-12/'. Make sure that:
    E
    E - '/my/local/models/cased_L-12_H-768_A-12/' is a correct model identifier listed on 'https://huggingface.co/models'
    E
    E - or '/my/local/models/cased_L-12_H-768_A-12/' is the correct path to a directory containing a config.json file
    同样,当我直接链接到 config.json 时:
      PATH = '/my/local/models/cased_L-12_H-768_A-12/bert_config.json'
    tokenizer = BertTokenizer.from_pretrained(PATH, local_files_only=True)

    if state_dict is None and not from_tf:
    try:
    state_dict = torch.load(resolved_archive_file, map_location="cpu")
    except Exception:
    raise OSError(
    > "Unable to load weights from pytorch checkpoint file. "
    "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. "
    )
    E OSError: Unable to load weights from pytorch checkpoint file. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
    我应该怎么做才能让 Huggingface 使用我的本地预训练模型?
    更新以解决评论
    YOURPATH = '/somewhere/on/disk/'

    name = 'transfo-xl-wt103'
    tokenizer = TransfoXLTokenizerFast(name)
    model = TransfoXLModel.from_pretrained(name)
    tokenizer.save_pretrained(YOURPATH)
    model.save_pretrained(YOURPATH)

    >>> Please note you will not be able to load the save vocabulary in Rust-based TransfoXLTokenizerFast as they don't share the same structure.
    ('/somewhere/on/disk/vocab.bin', '/somewhere/on/disk/special_tokens_map.json', '/somewhere/on/disk/added_tokens.json')

    所以一切都得救了,但是……
    YOURPATH = '/somewhere/on/disk/'
    TransfoXLTokenizerFast.from_pretrained('transfo-xl-wt103', cache_dir=YOURPATH, local_files_only=True)

    "Cannot find the requested files in the cached path and outgoing traffic has been"
    ValueError: Cannot find the requested files in the cached path and outgoing traffic has been disabled. To enable model look-ups and downloads online, set 'local_files_only' to False.

    最佳答案

    该文件相对于您的模型文件夹位于何处?我相信它必须是一个相对路径而不是绝对路径。因此,如果您编写代码的文件位于 'my/local/' ,那么你的代码应该是这样的:

    PATH = 'models/cased_L-12_H-768_A-12/'
    tokenizer = BertTokenizer.from_pretrained(PATH, local_files_only=True)
    您只需要指定所有文件所在的文件夹,而不是直接指定文件。我认为这绝对是 PATH 的问题。 .尝试改变“斜线”的样式:“/”与“\”,这些在不同的操作系统中是不同的。也可以尝试使用“.”,就像这样 ./models/cased_L-12_H-768_A-12/等等。

    关于huggingface-transformers - 使用 Huggingface Transformers 从磁盘加载预训练模型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64001128/

    28 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com