python - 如何使用 NLTK BigramAssocMeasures.ch

python - 如何使用 NLTK BigramAssocMeasures.ch_sq

转载作者：行者123 更新时间：2023-11-30 21:50:31

我有单词列表，我想通过考虑两个单词的共现来计算它们的相关性。从一篇论文中我发现它可以使用 PIL 森卡方检验来计算。我还找到了用于计算卡方值的 nltk.BigramAssocMeasures.ch_sq() 。

我可以用它来满足我的需要吗？如何使用 nltk 找到卡方值？

最佳答案

看看this blog from Streamhacker ，它通过代码示例给出了很好的解释。

One of the best metrics for information gain is chi square. NLTK includes this in the BigramAssocMeasures class in the metrics package. To use it, first we need to calculate a few frequencies for each word: its overall frequency and its frequency within each class. This is done with a FreqDist for overall frequency of words, and a ConditionalFreqDist where the conditions are the class labels. Once we have those numbers, we can score words with the BigramAssocMeasures.chi_sq function, then sort the words by score and take the top 10000. We then put these words into a set, and use a set membership test in our feature selection function to select only those words that appear in the set. Now each file is classified based on the presence of these high information words.

关于python - 如何使用 NLTK BigramAssocMeasures.ch_sq，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15401497/

文章推荐： python - Django modelformset order_by 不起作用

文章推荐： python - pandas DataFrame的combine_first和update方法有奇怪的行为

文章推荐： python - 如何使用 msvcrt.getch 提取和使用输入？

文章推荐：带有 LIKE 子句的 PHP 参数化 mysql 语句返回不同步的命令

python - 如何使用 NLTK BigramAssocMeasures.ch_sq
我有单词列表，我想通过考虑两个单词的共现来计算它们的相关性。从一篇论文中我发现它可以使用 PIL 森卡方检验来计算。我还找到了用于计算卡方值的 nltk.BigramAssocMeasures.ch_
python - 有人可以解释 BigramAssocMeasures.chi_sq 的语法吗？
我正在使用 NLTK 的 BigramAssocMeasures.chi_sq 来找出不同类别的单词提供的信息内容。但是我无法弄清楚如何为此功能提供数据。 NLTK 的定义说"""使用卡方对二元组进行

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 如何使用 NLTK BigramAssocMeasures.ch_sq