gpt4 book ai didi

python - textblob 教程中的 UnicodeDecodeError

转载 作者:行者123 更新时间:2023-11-28 16:41:12 24 4
gpt4 key购买 nike

我正在尝试使用 Python 3.3 在 Windows 中运行 TextBlob 教程(使用 Git Bash shell)。

我已经安装了 textblobnltk 以及任何依赖项。

Python代码是:

from text.blob import TextBlob

wiki = TextBlob("Python is a high-level, general-purpose programming language.")
tags = wiki.tags

我收到以下错误

Traceback (most recent call last):
File "textblob.py", line 4, in <module>
tags = wiki.tags
File "c:\Python33\lib\site-packages\text\decorators.py", line 18, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "c:\Python33\lib\site-packages\text\blob.py", line 357, in pos_tags
for word, t in self.pos_tagger.tag(self.raw)
File "c:\Python33\lib\site-packages\text\taggers.py", line 40, in tag
return pattern_tag(sentence, tokenize)
File "c:\Python33\lib\site-packages\text\en.py", line 115, in tag
for sentence in parse(s, tokenize, True, False, False, False, encoding).split():
File "c:\Python33\lib\site-packages\text\en.py", line 99, in parse
return parser.parse(unicode(s), *args, **kwargs)
File "c:\Python33\lib\site-packages\text\text.py", line 1213, in parse
s[i] = self.find_tags(s[i], **kwargs)
File "c:\Python33\lib\site-packages\text\en.py", line 49, in find_tags
return _Parser.find_tags(self, tokens, **kwargs)
File "c:\Python33\lib\site-packages\text\text.py", line 1161, in find_tags
map = kwargs.get( "map", None))
File "c:\Python33\lib\site-packages\text\text.py", line 967, in find_tags
tagged.append([token, lexicon.get(token, i==0 and lexicon.get(token.lower()) or None)])
File "c:\Python33\lib\site-packages\text\text.py", line 98, in get
return self._lazy("get", *args)
File "c:\Python33\lib\site-packages\text\text.py", line 79, in _lazy
self.load()
File "c:\Python33\lib\site-packages\text\text.py", line 367, in load
dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if x.strip()))
File "c:\Python33\lib\site-packages\text\text.py", line 367, in <genexpr>
dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if x.strip()))
File "c:\Python33\lib\site-packages\text\text.py", line 346, in _read
for line in f:
File "c:\Python33\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 16: character maps to <undefined>

知道这里出了什么问题吗?在字符串前添加 'u' 没有帮助。

最佳答案

0.7.1 版修复了这个问题,这意味着是时候了

$ pip install -U textblob

问题是用于词性标记的 en-lexicon.txt 文件使用 Windows 的默认平台编码 cp1252 打开文件。该文件显然包含 Python 无法从此编码解码的字符。已通过以 utf-8 模式显式打开文件解决此问题。

关于python - textblob 教程中的 UnicodeDecodeError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18987117/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com