gpt4 book ai didi

python - 为什么在请求同义词集时我不能将 wn.ADJ_SAT 作为 pos 传递

转载 作者:太空宇宙 更新时间:2023-11-04 03:40:07 25 4
gpt4 key购买 nike

我知道 wordnet 有一个 "adverb synset" type .我知道那是在 nltk 中的 synset 类型枚举中

from nltk.corpus import wordnet as wn
wn.ADJ_SAT
u's'

为什么我不能将它作为键传递给同义词集?

>>> wn.synsets('dog', wn.ADJ_SAT)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/nltk/corpus/reader/wordnet.py", line 1413, in synsets
for form in self._morphy(lemma, p)
File "/Library/Python/2.7/site-packages/nltk/corpus/reader/wordnet.py", line 1627, in _morphy
substitutions = self.MORPHOLOGICAL_SUBSTITUTIONS[pos]
KeyError: u's'

最佳答案

来自:

>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('able')
[Synset('able.a.01'), Synset('able.s.02'), Synset('able.s.03'), Synset('able.s.04')]
>>> wn.synsets('able', pos=wn.ADJ)
[Synset('able.a.01'), Synset('able.s.02'), Synset('able.s.03'), Synset('able.s.04')]
>>> wn.synsets('able', pos=wn.ADJ_SAT)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 1413, in synsets
for form in self._morphy(lemma, p)
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 1627, in _morphy
substitutions = self.MORPHOLOGICAL_SUBSTITUTIONS[pos]
KeyError: u's'

来自 https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1397 ,我们看到当您尝试从 NLTK wordnet API 检索同义词集时,POS 限制出现在调用 self._morphy(lemma, p) 功能:

def synsets(self, lemma, pos=None, lang='en'):
"""Load all synsets with a given lemma and part of speech tag.
If no pos is specified, all synsets for all parts of speech
will be loaded.
If lang is specified, all the synsets associated with the lemma name
of that language will be returned.
"""
lemma = lemma.lower()

if lang == 'en':
get_synset = self._synset_from_pos_and_offset
index = self._lemma_pos_offset_map
if pos is None:
pos = POS_LIST
return [get_synset(p, offset)
for p in pos
for form in self._morphy(lemma, p)
for offset in index[form].get(p, [])]

如果我们查看 _morphy() 函数,来自 https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1573 .

 def _morphy(self, form, pos):
# from jordanbg:
# Given an original string x
# 1. Apply rules once to the input to get y1, y2, y3, etc.
# 2. Return all that are in the database
# 3. If there are no matches, keep applying rules until you either
# find a match or you can't go any further

exceptions = self._exception_map[pos]
substitutions = self.MORPHOLOGICAL_SUBSTITUTIONS[pos]

def apply_rules(forms):
return [form[:-len(old)] + new
for form in forms
for old, new in substitutions
if form.endswith(old)]

def filter_forms(forms):
result = []
seen = set()
for form in forms:
if form in self._lemma_pos_offset_map:
if pos in self._lemma_pos_offset_map[form]:
if form not in seen:
result.append(form)
seen.add(form)
return result

# 0. Check the exception lists
if form in exceptions:
return filter_forms([form] + exceptions[form])

# 1. Apply rules once to the input to get y1, y2, y3, etc.
forms = apply_rules([form])

# 2. Return all that are in the database (and check the original too)
results = filter_forms([form] + forms)
if results:
return results

# 3. If there are no matches, keep applying rules until we find a match
while forms:
forms = apply_rules(forms)
results = filter_forms(forms)
if results:
return results

# Return an empty list if we can't find anything
return []

我们看到它从 substitutions = self.MORPHOLOGICAL_SUBSTITUTIONS[pos] 中检索一些替换规则,以在检索存储在“based”/“root”形式中的同义词集之前执行一些词法还原.例如

>>> from nltk.corpus import wordnet as wn
>>> wn._morphy('dogs', 'n')
[u'dog']

如果我们查看 MORPHOLOGICAL_SUBSTITUTIONS,我们会发现缺少 ADJ_SAT,请参阅 https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1609 :

MORPHOLOGICAL_SUBSTITUTIONS = {
NOUN: [('s', ''), ('ses', 's'), ('ves', 'f'), ('xes', 'x'),
('zes', 'z'), ('ches', 'ch'), ('shes', 'sh'),
('men', 'man'), ('ies', 'y')],
VERB: [('s', ''), ('ies', 'y'), ('es', 'e'), ('es', ''),
('ed', 'e'), ('ed', ''), ('ing', 'e'), ('ing', '')],
ADJ: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')],
ADV: []}

因此,为了防止这种情况发生,可以在 https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1609 的第 1609 行之后添加此行进行简单修复。 :

MORPHOLOGICAL_SUBSTITUTIONS[ADJ_SAT] = MORPHOLOGICAL_SUBSTITUTIONS[ADJ]

概念验证:

>>> MORPHOLOGICAL_SUBSTITUTIONS = {
... 1: [('s', ''), ('ses', 's'), ('ves', 'f'), ('xes', 'x'),
... ('zes', 'z'), ('ches', 'ch'), ('shes', 'sh'),
... ('men', 'man'), ('ies', 'y')],
... 2: [('s', ''), ('ies', 'y'), ('es', 'e'), ('es', ''),
... ('ed', 'e'), ('ed', ''), ('ing', 'e'), ('ing', '')],
... 3: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')],
... 4: []}
>>>
>>> MORPHOLOGICAL_SUBSTITUTIONS[5] = MORPHOLOGICAL_SUBSTITUTIONS[3]
>>> MORPHOLOGICAL_SUBSTITUTIONS
{1: [('s', ''), ('ses', 's'), ('ves', 'f'), ('xes', 'x'), ('zes', 'z'), ('ches', 'ch'), ('shes', 'sh'), ('men', 'man'), ('ies', 'y')], 2: [('s', ''), ('ies', 'y'), ('es', 'e'), ('es', ''), ('ed', 'e'), ('ed', ''), ('ing', 'e'), ('ing', '')], 3: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')], 4: [], 5: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')]}

关于python - 为什么在请求同义词集时我不能将 wn.ADJ_SAT 作为 pos 传递,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26900954/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com