python - 为什么在请求同义词集时我不能将 wn.ADJ

python - 为什么在请求同义词集时我不能将 wn.ADJ_SAT 作为 pos 传递

转载作者：太空宇宙更新时间：2023-11-04 03:40:07

我知道 wordnet 有一个 "adverb synset" type .我知道那是在 nltk 中的 synset 类型枚举中

from nltk.corpus import wordnet as wn
wn.ADJ_SAT
u's'

为什么我不能将它作为键传递给同义词集？

>>> wn.synsets('dog', wn.ADJ_SAT)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/nltk/corpus/reader/wordnet.py", line 1413, in synsets
    for form in self._morphy(lemma, p)
  File "/Library/Python/2.7/site-packages/nltk/corpus/reader/wordnet.py", line 1627, in _morphy
    substitutions = self.MORPHOLOGICAL_SUBSTITUTIONS[pos]
KeyError: u's'

最佳答案

来自:

>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('able')
[Synset('able.a.01'), Synset('able.s.02'), Synset('able.s.03'), Synset('able.s.04')]
>>> wn.synsets('able', pos=wn.ADJ)
[Synset('able.a.01'), Synset('able.s.02'), Synset('able.s.03'), Synset('able.s.04')]
>>> wn.synsets('able', pos=wn.ADJ_SAT)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 1413, in synsets
    for form in self._morphy(lemma, p)
  File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 1627, in _morphy
    substitutions = self.MORPHOLOGICAL_SUBSTITUTIONS[pos]
KeyError: u's'

来自 https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1397 ，我们看到当您尝试从 NLTK wordnet API 检索同义词集时，POS 限制出现在调用 self._morphy(lemma, p) 功能:

def synsets(self, lemma, pos=None, lang='en'):
    """Load all synsets with a given lemma and part of speech tag.
    If no pos is specified, all synsets for all parts of speech
    will be loaded. 
    If lang is specified, all the synsets associated with the lemma name
    of that language will be returned.
    """
    lemma = lemma.lower()

    if lang == 'en':
        get_synset = self._synset_from_pos_and_offset
        index = self._lemma_pos_offset_map
        if pos is None:
            pos = POS_LIST
        return [get_synset(p, offset)
                for p in pos
                for form in self._morphy(lemma, p)
                for offset in index[form].get(p, [])]

如果我们查看 _morphy() 函数，来自 https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1573 .

 def _morphy(self, form, pos):
        # from jordanbg:
        # Given an original string x
        # 1. Apply rules once to the input to get y1, y2, y3, etc.
        # 2. Return all that are in the database
        # 3. If there are no matches, keep applying rules until you either
        #    find a match or you can't go any further

        exceptions = self._exception_map[pos]
        substitutions = self.MORPHOLOGICAL_SUBSTITUTIONS[pos]

        def apply_rules(forms):
            return [form[:-len(old)] + new
                    for form in forms
                    for old, new in substitutions
                    if form.endswith(old)]

        def filter_forms(forms):
            result = []
            seen = set()
            for form in forms:
                if form in self._lemma_pos_offset_map:
                    if pos in self._lemma_pos_offset_map[form]:
                        if form not in seen:
                            result.append(form)
                            seen.add(form)
            return result

        # 0. Check the exception lists
        if form in exceptions:
            return filter_forms([form] + exceptions[form])

        # 1. Apply rules once to the input to get y1, y2, y3, etc.
        forms = apply_rules([form])

        # 2. Return all that are in the database (and check the original too)
        results = filter_forms([form] + forms)
        if results:
            return results

        # 3. If there are no matches, keep applying rules until we find a match
        while forms:
            forms = apply_rules(forms)
            results = filter_forms(forms)
            if results:
                return results

        # Return an empty list if we can't find anything
        return []

我们看到它从 substitutions = self.MORPHOLOGICAL_SUBSTITUTIONS[pos] 中检索一些替换规则，以在检索存储在“based”/“root”形式中的同义词集之前执行一些词法还原.例如

>>> from nltk.corpus import wordnet as wn
>>> wn._morphy('dogs', 'n')
[u'dog']

如果我们查看 MORPHOLOGICAL_SUBSTITUTIONS，我们会发现缺少 ADJ_SAT，请参阅 https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1609 :

MORPHOLOGICAL_SUBSTITUTIONS = {
    NOUN: [('s', ''), ('ses', 's'), ('ves', 'f'), ('xes', 'x'),
           ('zes', 'z'), ('ches', 'ch'), ('shes', 'sh'),
           ('men', 'man'), ('ies', 'y')],
    VERB: [('s', ''), ('ies', 'y'), ('es', 'e'), ('es', ''),
           ('ed', 'e'), ('ed', ''), ('ing', 'e'), ('ing', '')],
    ADJ: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')],
    ADV: []}

因此，为了防止这种情况发生，可以在 https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1609 的第 1609 行之后添加此行进行简单修复。 :

MORPHOLOGICAL_SUBSTITUTIONS[ADJ_SAT] = MORPHOLOGICAL_SUBSTITUTIONS[ADJ]

概念验证:

>>> MORPHOLOGICAL_SUBSTITUTIONS = {
...     1: [('s', ''), ('ses', 's'), ('ves', 'f'), ('xes', 'x'),
...            ('zes', 'z'), ('ches', 'ch'), ('shes', 'sh'),
...            ('men', 'man'), ('ies', 'y')],
...     2: [('s', ''), ('ies', 'y'), ('es', 'e'), ('es', ''),
...            ('ed', 'e'), ('ed', ''), ('ing', 'e'), ('ing', '')],
...     3: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')],
...     4: []}
>>> 
>>> MORPHOLOGICAL_SUBSTITUTIONS[5] = MORPHOLOGICAL_SUBSTITUTIONS[3]
>>> MORPHOLOGICAL_SUBSTITUTIONS
{1: [('s', ''), ('ses', 's'), ('ves', 'f'), ('xes', 'x'), ('zes', 'z'), ('ches', 'ch'), ('shes', 'sh'), ('men', 'man'), ('ies', 'y')], 2: [('s', ''), ('ies', 'y'), ('es', 'e'), ('es', ''), ('ed', 'e'), ('ed', ''), ('ing', 'e'), ('ing', '')], 3: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')], 4: [], 5: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')]}

关于python - 为什么在请求同义词集时我不能将 wn.ADJ_SAT 作为 pos 传递，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26900954/

文章推荐：写入文件的python会添加空字符

文章推荐： html - 删除标签上的内联样式并替换为类

文章推荐： jquery - Canvas 外菜单 CSS 过渡问题

python - 为什么在请求同义词集时我不能将 wn.ADJ_SAT 作为 pos 传递
我知道 wordnet 有一个 "adverb synset" type .我知道那是在 nltk 中的 synset 类型枚举中 from nltk.corpus import wordnet as
python - WordNetLemmatizer : Different handling of wn. ADJ 和 wn.ADJ_SAT？
我需要使用 nltk 对文本进行词形还原。为了做到这一点，我申请 nltk.pos_tag到每个句子，然后将生成的 Penn Treebank 标签 (http://www.ling.upenn.ed

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 为什么在请求同义词集时我不能将 wn.ADJ_SAT 作为 pos 传递