python - 使用 tokenizer.encode_plus 时遇到问题-6ren

python - 使用 tokenizer.encode_plus 时遇到问题

转载作者：行者123 更新时间：2023-12-05 08:05:42

#jupyter 笔记本

我正在尝试使用 https://colab.research.google.com/drive/1pTuQhug6Dhl9XalKB0zUGf4FIdYFlpcX#scrollTo=2bBdb3pt8LuQ 研究 BERT 分类器

在那个 colab 中，从“标记所有句子......”开始

在那部分，我遇到了麻烦“TypeError:_tokenize() 得到了一个意外的关键字参数‘pad_to_max_length’”

**
input_ids = []
attention_masks = []

for sent in sentences:
    encoded_dict = tokenizer.encode_plus(
                    sent,                      # Sentence to encode.
                    add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                    max_length = 64,           # Pad & truncate all sentences.
                    pad_to_max_length = True,
                    return_attention_mask = True,   # Construct attn. masks.
                    return_tensors = 'pt',     # Return pytorch tensors.
               )

最佳答案

引用:this post

“问题在于 conda 仅在版本 2.1.1(存储库信息)中提供了转换器库，而该版本没有 pad_to_max_length 参数。”

所以也许最好的选择是卸载然后重新安装转换器(这次使用 pip install 而不是 conda forge)或者创建一个新的 conda 环境并安装所有内容(通过 pip 而不是通过 conda)。

关于python - 使用 tokenizer.encode_plus 时遇到问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63884856/

文章推荐： reactjs - CRA typescript环境导出类型时如何解决解析错误

文章推荐： flutter - 如何在 Flutter 中绘制尖三角形边？

pytorch - 抱脸变形金刚 : truncation strategy in encode_plus
encode_plus在 huggingface 的变形金刚库中，允许截断输入序列。两个相关参数:truncation 和 max_length。我正在将成对的输入序列传递给 encode_plus
python - 使用 tokenizer.encode_plus 时遇到问题
#jupyter 笔记本我正在尝试使用 https://colab.research.google.com/drive/1pTuQhug6Dhl9XalKB0zUGf4FIdYFlpcX#scrol

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 使用 tokenizer.encode_plus 时遇到问题