gpt4 book ai didi

python - 特定单词的 NLTK 搭配

转载 作者:太空狗 更新时间:2023-10-29 17:46:44 27 4
gpt4 key购买 nike

我知道如何使用 NLTK 获取二元组和三元组搭配,并将它们应用到我自己的语料库中。代码如下。

不过我不确定 (1) 如何获取特定单词的搭配? (2) NLTK 是否有基于对数似然比的配置度量?

import nltk
from nltk.collocations import *
from nltk.tokenize import word_tokenize

text = "this is a foo bar bar black sheep foo bar bar black sheep foo bar bar black sheep shep bar bar black sentence"

trigram_measures = nltk.collocations.TrigramAssocMeasures()
finder = TrigramCollocationFinder.from_words(word_tokenize(text))

for i in finder.score_ngrams(trigram_measures.pmi):
print i

最佳答案

试试这段代码:

import nltk
from nltk.collocations import *
bigram_measures = nltk.collocations.BigramAssocMeasures()
trigram_measures = nltk.collocations.TrigramAssocMeasures()

# Ngrams with 'creature' as a member
creature_filter = lambda *w: 'creature' not in w


## Bigrams
finder = BigramCollocationFinder.from_words(
nltk.corpus.genesis.words('english-web.txt'))
# only bigrams that appear 3+ times
finder.apply_freq_filter(3)
# only bigrams that contain 'creature'
finder.apply_ngram_filter(creature_filter)
# return the 10 n-grams with the highest PMI
print finder.nbest(bigram_measures.likelihood_ratio, 10)


## Trigrams
finder = TrigramCollocationFinder.from_words(
nltk.corpus.genesis.words('english-web.txt'))
# only trigrams that appear 3+ times
finder.apply_freq_filter(3)
# only trigrams that contain 'creature'
finder.apply_ngram_filter(creature_filter)
# return the 10 n-grams with the highest PMI
print finder.nbest(trigram_measures.likelihood_ratio, 10)

它使用似然度量并过滤掉不包含单词“creature”的 Ngram

关于python - 特定单词的 NLTK 搭配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21165702/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com