- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
试图跟随 simple Doc
initialization in the docs在 Python 2 中不起作用:
>>> import textacy
>>> content = '''
... The apparent symmetry between the quark and lepton families of
... the Standard Model (SM) are, at the very least, suggestive of
... a more fundamental relationship between them. In some Beyond the
... Standard Model theories, such interactions are mediated by
... leptoquarks (LQs): hypothetical color-triplet bosons with both
... lepton and baryon number and fractional electric charge.'''
>>> metadata = {
... 'title': 'A Search for 2nd-generation Leptoquarks at √s = 7 TeV',
... 'author': 'Burton DeWilde',
... 'pub_date': '2012-08-01'}
>>> doc = textacy.Doc(content, metadata=metadata)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/textacy/doc.py", line 120, in __init__
{compat.unicode_, SpacyDoc}, type(content)))
ValueError: `Doc` must be initialized with set([<type 'unicode'>, <type 'spacy.tokens.doc.Doc'>]) content, not "<type 'str'>"
对于字符串或字符串序列,简单的初始化应该是什么样的?
更新:
将 unicode(content)
传递给 textacy.Doc()
吐出
ImportError: 'cld2-cffi' must be installed to use textacy's automatic language detection; you may do so via 'pip install cld2-cffi' or 'pip install textacy[lang]'.
从安装 textacy 的那一刻起,我会很高兴。
即使在安装 cld2-cffi
之后,尝试上面的代码也会失败
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/textacy/doc.py", line 114, in __init__
self._init_from_text(content, metadata, lang)
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/textacy/doc.py", line 136, in _init_from_text
spacy_lang = cache.load_spacy(langstr)
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/cachetools/__init__.py", line 46, in wrapper
v = func(*args, **kwargs)
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/textacy/cache.py", line 99, in load_spacy
return spacy.load(name, disable=disable)
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/spacy/__init__.py", line 21, in load
return util.load_model(name, **overrides)
File "/Users/a/anaconda/envs/env1/lib/python2.7/site-packages/spacy/util.py", line 120, in load_model
raise IOError("Can't find model '%s'" % name)
IOError: Can't find model 'en'
最佳答案
如回溯中所示,该问题位于 textacy/doc.py
。在 _init_from_text()
函数中,该函数尝试检测语言并在第 136 行使用字符串 'en'
调用它。(spacy
存储库在 this issue comment. 中提到了这一点)
我通过提供有效的 lang
(unicode) 字符串 u'en_core_web_sm'
并在 content
和 lang
参数字符串。
import textacy
content = u'''
The apparent symmetry between the quark and lepton families of
the Standard Model (SM) are, at the very least, suggestive of
a more fundamental relationship between them. In some Beyond the
Standard Model theories, such interactions are mediated by
leptoquarks (LQs): hypothetical color-triplet bosons with both
lepton and baryon number and fractional electric charge.'''
metadata = {
'title': 'A Search for 2nd-generation Leptoquarks at √s = 7 TeV',
'author': 'Burton DeWilde',
'pub_date': '2012-08-01'}
doc = textacy.Doc(content, metadata=metadata, lang=u'en_core_web_sm')
字符串而不是 unicode 字符串(带有神秘的错误消息)改变了行为,缺少包的事实,以及使用 spacy
的可能过时/可能不全面的方式语言字符串对我来说都像是错误。 🤷♂️
关于python - 如何在 textacy 0.6.2 中初始化 `Doc`?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51431112/
我是这些框架和 NLP 的新手。我正在关注一个示例,该示例为我提供了以下代码片段来计算推文中所有标记的 tf-idf 分数。但是,我不断收到导入错误或 Vectorizer undefined。 代码
找不到模块'textacy'没有属性'Doc' 我正在尝试从 spacy 中提取动词短语,但没有这样的库。请帮助我如何使用 spacy 提取动词短语或形容词短语。我想做完整的浅解析。 def extr
试图跟随 simple Doc initialization in the docs在 Python 2 中不起作用: >>> import textacy >>> content = ''' ...
我正在使用 Textacy(在 Spacy 之上)来处理许多文本片段。 具体来说,我使用 Textacy 的可读性分数。由于我有很多短文本,因此我收到一条警告,提示我需要取消显示,否则会使我的笔记本崩
我是一名优秀的程序员,十分优秀!