gpt4 book ai didi

python - 操作系统错误 : Can't load tokenizer

转载 作者:行者123 更新时间:2023-12-05 06:04:23 24 4
gpt4 key购买 nike

我想从头开始训练 XLNET 语言模型。首先,我训练了一个分词器,如下所示:

from tokenizers import ByteLevelBPETokenizer

# Initialize a tokenizer
tokenizer = ByteLevelBPETokenizer()
# Customize training
tokenizer.train(files='data.txt', min_frequency=2, special_tokens=[ #defualt vocab size
"<s>",
"<pad>",
"</s>",
"<unk>",
"<mask>",
])
tokenizer.save_model("tokenizer model")

最后,我将在给定目录中有两个文件:

merges.txt
vocab.json

我已经为模型定义了以下配置:

from transformers import XLNetConfig, XLNetModel
config = XLNetConfig()

现在,我想在转换器中重新创建我的分词器:

from transformers import XLNetTokenizerFast

tokenizer = XLNetTokenizerFast.from_pretrained("tokenizer model")

但是出现如下错误:

File "dfgd.py", line 8, in <module>
tokenizer = XLNetTokenizerFast.from_pretrained("tokenizer model")
File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\transformers\tokenization_utils_base.py", line 1777, in from_pretrained
raise EnvironmentError(msg)
OSError: Can't load tokenizer for 'tokenizer model'. Make sure that:

- 'tokenizer model' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'tokenizer model' is the correct path to a directory containing relevant tokenizer files

我该怎么办?

最佳答案

代替

tokenizer = XLNetTokenizerFast.from_pretrained("tokenizer model")

你应该写:

from tokenizers.implementations import ByteLevelBPETokenizer
tokenizer = ByteLevelBPETokenizer(
"tokenizer model/vocab.json",
"tokenizer model/merges.txt",
)

关于python - 操作系统错误 : Can't load tokenizer,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66293355/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com