python - 根据 pos nlp 对字符串进行词形还原-6ren

python - 根据 pos nlp 对字符串进行词形还原

转载作者：太空宇宙更新时间：2023-11-04 05:15:30

我正在尝试根据词性对字符串进行词形还原，但在最后阶段出现错误。我的代码:

import nltk
from nltk.stem import *
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import wordnet
wordnet_lemmatizer = WordNetLemmatizer()
text = word_tokenize('People who help the blinging lights are the way of the future and are heading properly to their goals')
tagged = nltk.pos_tag(text)

def get_wordnet_pos(treebank_tag):

    if treebank_tag.startswith('J'):
        return wordnet.ADJ
    elif treebank_tag.startswith('V'):
        return wordnet.VERB
    elif treebank_tag.startswith('N'):
        return wordnet.NOUN
    elif treebank_tag.startswith('R'):
        return wordnet.ADV
    else:
        return ''

for word in tagged: print(wordnet_lemmatizer.lemmatize(word,pos='v'), end=" ")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-40-afb22c78f770> in <module>()
----> 1 for word in tagged: print(wordnet_lemmatizer.lemmatize(word,pos='v'), end=" ")

E:\Miniconda3\envs\uol1\lib\site-packages\nltk\stem\wordnet.py in lemmatize(self, word, pos)
     38 
     39     def lemmatize(self, word, pos=NOUN):
---> 40         lemmas = wordnet._morphy(word, pos)
     41         return min(lemmas, key=len) if lemmas else word
     42 

E:\Miniconda3\envs\uol1\lib\site-packages\nltk\corpus\reader\wordnet.py in _morphy(self, form, pos)
   1710 
   1711         # 1. Apply rules once to the input to get y1, y2, y3, etc.
-> 1712         forms = apply_rules([form])
   1713 
   1714         # 2. Return all that are in the database (and check the original too)

E:\Miniconda3\envs\uol1\lib\site-packages\nltk\corpus\reader\wordnet.py in apply_rules(forms)
   1690         def apply_rules(forms):
   1691             return [form[:-len(old)] + new
-> 1692                     for form in forms
   1693                     for old, new in substitutions
   1694                     if form.endswith(old)]

E:\Miniconda3\envs\uol1\lib\site-packages\nltk\corpus\reader\wordnet.py in <listcomp>(.0)
   1692                     for form in forms
   1693                     for old, new in substitutions
-> 1694                     if form.endswith(old)]
   1695 
   1696         def filter_forms(forms):

我希望能够同时根据每个词的词性对该字符串进行词形还原。请帮忙。

最佳答案

首先，尽量不要像这样混合顶级、绝对和相对导入:

import nltk
from nltk.stem import *
from nltk import pos_tag, word_tokenize

这样会更好:

from nltk import sent_tokenize, word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet as wn

(参见 Absolute vs. explicit relative import of Python module)

您遇到的错误很可能是因为您将 pos_tag 的输出作为 WordNetLemmatizer.lemmatize() 的输入，即:

>>> from nltk import pos_tag
>>> from nltk.stem import WordNetLemmatizer

>>> wnl = WordNetLemmatizer()
>>> sent = 'People who help the blinging lights are the way of the future and are heading properly to their goals'.split()

>>> pos_tag(sent)
[('People', 'NNS'), ('who', 'WP'), ('help', 'VBP'), ('the', 'DT'), ('blinging', 'NN'), ('lights', 'NNS'), ('are', 'VBP'), ('the', 'DT'), ('way', 'NN'), ('of', 'IN'), ('the', 'DT'), ('future', 'NN'), ('and', 'CC'), ('are', 'VBP'), ('heading', 'VBG'), ('properly', 'RB'), ('to', 'TO'), ('their', 'PRP$'), ('goals', 'NNS')]
>>> pos_tag(sent)[0]
('People', 'NNS')

>>> first_word = pos_tag(sent)[0]
>>> wnl.lemmatize(first_word)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/nltk/stem/wordnet.py", line 40, in lemmatize
    lemmas = wordnet._morphy(word, pos)
  File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 1712, in _morphy
    forms = apply_rules([form])
  File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 1694, in apply_rules
    if form.endswith(old)]
AttributeError: 'tuple' object has no attribute 'endswith'

WordNetLemmatizer.lemmatize() 的输入应该是 str 而不是元组，所以如果您这样做:

>>> tagged_sent = pos_tag(sent)

>>> def penn2morphy(penntag, returnNone=False):
...     morphy_tag = {'NN':wn.NOUN, 'JJ':wn.ADJ,
...                   'VB':wn.VERB, 'RB':wn.ADV}
...     try:
...         return morphy_tag[penntag[:2]]
...     except:
...         return None if returnNone else ''
... 

>>> for word, tag in tagged_sent:
...     wntag = penn2morphy(tag)
...     if wntag:
...         print wnl.lemmatize(word, pos=wntag)
...     else:
...         print word
... 
People
who
help
the
blinging
light
be
the
way
of
the
future
and
be
head
properly
to
their
goal

或者如果你喜欢简单的方法:

pip install pywsd

然后:

>>> from pywsd.utils import lemmatize, lemmatize_sentence
>>> sent = 'People who help the blinging lights are the way of the future and are heading properly to their goals'
>>> lemmatize_sentence(sent)
['people', 'who', 'help', 'the', u'bling', u'light', u'be', 'the', 'way', 'of', 'the', 'future', 'and', u'be', u'head', 'properly', 'to', 'their', u'goal']

关于python - 根据 pos nlp 对字符串进行词形还原，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41824782/

文章推荐： html - 奇怪的 HTML/CSS 行为

文章推荐： CSS:添加位置后位置不正确:已修复；

文章推荐： python - 如何在 Pandas 中拆分数据框

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 根据 pos nlp 对字符串进行词形还原