gpt4 book ai didi

python - NLTK 词汇表中缺少单词 - Python

转载 作者:太空宇宙 更新时间:2023-11-04 08:51:33 28 4
gpt4 key购买 nike

我正在测试 NLTK package的词汇。我使用了以下代码并希望看到所有 True

import nltk

english_vocab = set(w.lower() for w in nltk.corpus.words.words())

print ('answered' in english_vocab)
print ('unanswered' in english_vocab)
print ('altered' in english_vocab)
print ('alter' in english_vocab)
print ('looks' in english_vocab)
print ('look' in english_vocab)

但是我的结果如下,少了那么多词,或者说是少了某些形式的词?我错过了什么吗?

False
True
False
True
False
True

最佳答案

的确,语料库并不是所有英语单词的详尽列表,而是文本的集合。判断一个单词是否为有效英语单词的更合适的方法是使用 wordnet:

from nltk.corpus import wordnet as wn

print wn.synsets('answered')
# [Synset('answer.v.01'), Synset('answer.v.02'), Synset('answer.v.03'), Synset('answer.v.04'), Synset('answer.v.05'), Synset('answer.v.06'), Synset('suffice.v.01'), Synset('answer.v.08'), Synset('answer.v.09'), Synset('answer.v.10')]

print wn.synsets('unanswered')
# [Synset('unanswered.s.01')]

print wn.synsets('notaword')
# []

关于python - NLTK 词汇表中缺少单词 - Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34756738/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com