gpt4 book ai didi

Python NLTK : how to lemmatize text include verb in english?

转载 作者:太空宇宙 更新时间:2023-11-03 12:48:09 26 4
gpt4 key购买 nike

我想对这段文字进行词形还原,它只是对名词进行词形还原,我还需要对动词进行词形还原

    >>> import nltk, re, string
>>> from nltk.stem import WordNetLemmatizer
>>> from urllib import urlopen
>>> url="https://raw.githubusercontent.com/evandrix/nltk_data/master/corpora/europarl_raw/english/ep-00-01-17.en"
>>> raw = urlopen(url).read()
>>> raw ="".join(l for l in raw if l not in string.punctuation)
>>> tokens=nltk.word_tokenize(raw)
>>> from nltk.stem import WordNetLemmatizer
>>> lemmatizer = WordNetLemmatizer()
>>> lem = [lemmatizer.lemmatize(t) for t in tokens]
>>> lem[:20]
['Resumption', 'of', 'the', 'session', 'I', 'declare', 'resumed', 'the', 'session', 'of', 'the', 'European', 'Parliament', 'adjourned', 'on', 'Friday', '17', 'December', '1999', 'and']

这里像resumed这样的动词应该是resume你能告诉我我应该怎么做才能对整个文本进行词形还原

最佳答案

在wordnetlemmatizer中使用pos参数:

>>> from nltk.stem import WordNetLemmatizer
>>> from nltk import pos_tag
>>> wnl = WordNetLemmatizer()
>>> wnl.lemmatize('resumed')
'resumed'
>>> wnl.lemmatize('resumed', pos='v')
u'resume'

完整代码如下,带有pos_tag功能:

>>> from nltk import word_tokenize, pos_tag
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> txt = """Resumption of the session I declare resumed the session of the European Parliament adjourned on Friday 17 December 1999 , and I would like once again to wish you a happy new year in the hope that you enjoyed a pleasant festive period ."""
>>> [wnl.lemmatize(i,j[0].lower()) if j[0].lower() in ['a','n','v'] else wnl.lemmatize(i) for i,j in pos_tag(word_tokenize(txt))]
['Resumption', 'of', 'the', 'session', 'I', 'declare', u'resume', 'the', 'session', 'of', 'the', 'European', 'Parliament', u'adjourn', 'on', 'Friday', '17', 'December', '1999', ',', 'and', 'I', 'would', 'like', 'once', 'again', 'to', 'wish', 'you', 'a', 'happy', 'new', 'year', 'in', 'the', 'hope', 'that', 'you', u'enjoy', 'a', 'pleasant', 'festive', 'period', '.']

关于Python NLTK : how to lemmatize text include verb in english?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24232702/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com