gpt4 book ai didi

python - 如何在 textacy 0.6.2 中初始化 `Doc`?

转载 作者:行者123 更新时间:2023-11-28 18:09:44 25 4
gpt4 key购买 nike

试图跟随 simple Doc initialization in the docs在 Python 2 中不起作用:

>>> import textacy
>>> content = '''
... The apparent symmetry between the quark and lepton families of
... the Standard Model (SM) are, at the very least, suggestive of
... a more fundamental relationship between them. In some Beyond the
... Standard Model theories, such interactions are mediated by
... leptoquarks (LQs): hypothetical color-triplet bosons with both
... lepton and baryon number and fractional electric charge.'''
>>> metadata = {
... 'title': 'A Search for 2nd-generation Leptoquarks at √s = 7 TeV',
... 'author': 'Burton DeWilde',
... 'pub_date': '2012-08-01'}
>>> doc = textacy.Doc(content, metadata=metadata)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/textacy/doc.py", line 120, in __init__
{compat.unicode_, SpacyDoc}, type(content)))
ValueError: `Doc` must be initialized with set([<type 'unicode'>, <type 'spacy.tokens.doc.Doc'>]) content, not "<type 'str'>"

对于字符串或字符串序列,简单的初始化应该是什么样的?

更新:

unicode(content) 传递给 textacy.Doc() 吐出

ImportError: 'cld2-cffi' must be installed to use textacy's automatic language detection; you may do so via 'pip install cld2-cffi' or 'pip install textacy[lang]'.

从安装 textacy 的那一刻起,我会很高兴。

即使在安装 cld2-cffi 之后,尝试上面的代码也会失败

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/textacy/doc.py", line 114, in __init__
self._init_from_text(content, metadata, lang)
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/textacy/doc.py", line 136, in _init_from_text
spacy_lang = cache.load_spacy(langstr)
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/cachetools/__init__.py", line 46, in wrapper
v = func(*args, **kwargs)
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/textacy/cache.py", line 99, in load_spacy
return spacy.load(name, disable=disable)
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/spacy/__init__.py", line 21, in load
return util.load_model(name, **overrides)
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/spacy/util.py", line 120, in load_model
raise IOError("Can't find model '%s'" % name)
IOError: Can't find model 'en'

最佳答案

如回溯中所示,该问题位于 textacy/doc.py。在 _init_from_text() 函数中,该函数尝试检测语言并在第 136 行使用字符串 'en' 调用它。(spacy 存储库在 this issue comment. 中提到了这一点)

我通过提供有效的 lang (unicode) 字符串 u'en_core_web_sm' 并在 contentlang 参数字符串。

import textacy

content = u'''
The apparent symmetry between the quark and lepton families of
the Standard Model (SM) are, at the very least, suggestive of
a more fundamental relationship between them. In some Beyond the
Standard Model theories, such interactions are mediated by
leptoquarks (LQs): hypothetical color-triplet bosons with both
lepton and baryon number and fractional electric charge.'''

metadata = {
'title': 'A Search for 2nd-generation Leptoquarks at √s = 7 TeV',
'author': 'Burton DeWilde',
'pub_date': '2012-08-01'}

doc = textacy.Doc(content, metadata=metadata, lang=u'en_core_web_sm')

字符串而不是 unicode 字符串(带有神秘的错误消息)改变了行为,缺少包的事实,以及使用 spacy 的可能过时/可能不全面的方式语言字符串对我来说都像是错误。 🤷‍♂️

关于python - 如何在 textacy 0.6.2 中初始化 `Doc`?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51431112/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com