gpt4 book ai didi

python - NLTK 分类器在情感分析中只给出否定答案

转载 作者:太空宇宙 更新时间:2023-11-04 03:13:48 25 4
gpt4 key购买 nike

我正在使用 NLTK 进行情绪分析,使用内置的语料库 movie_reviews 进行训练,每次我都得到 neg 作为结果。

我的代码:

import nltk
import random
import pickle
from nltk.corpus import movie_reviews
from os.path import exists
from nltk.classify import apply_features
from nltk.tokenize import word_tokenize, sent_tokenize

documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]

all_words = []
for w in movie_reviews.words():
all_words.append(w.lower())
all_words = nltk.FreqDist(all_words)
word_features = list(all_words.keys())
print(word_features)

def find_features(document):
words = set(document)
features = {}
for w in word_features:
features[w] = (w in words)
return features

featuresets = [(find_features(rev), category) for (rev, category) in documents]
numtrain = int(len(documents) * 90 / 100)
training_set = apply_features(find_features, documents[:numtrain])
testing_set = apply_features(find_features, documents[numtrain:])

classifier = nltk.NaiveBayesClassifier.train(training_set)
classifier.show_most_informative_features(15)

Example_Text = " avoids annual conveys vocal thematic doubts fascination slip avoids outstanding thematic astounding seamless"

doc = word_tokenize(Example_Text.lower())
featurized_doc = {i:(i in doc) for i in word_features}
tagged_label = classifier.classify(featurized_doc)
print(tagged_label)

我在这里使用 NaiveBayes 分类器,我用 movie_reviews 语料库训练数据,然后使用这个训练过的分类器来测试我的 Example_test< 的情绪

现在你可以看到我的 Example_Text,它有一些随机的词。当我执行 classifier.show_most_informative_features(15) 时,它会为我提供一个包含 15 个单词的列表,这些单词的正负比例最高。我选择了此列表中显示的正面词。

Most Informative Features
avoids = True pos : neg = 12.1 : 1.0
insulting = True neg : pos = 10.8 : 1.0
atrocious = True neg : pos = 10.6 : 1.0
outstanding = True pos : neg = 10.2 : 1.0
seamless = True pos : neg = 10.1 : 1.0
thematic = True pos : neg = 10.1 : 1.0
astounding = True pos : neg = 10.1 : 1.0
3000 = True neg : pos = 9.9 : 1.0
hudson = True neg : pos = 9.9 : 1.0
ludicrous = True neg : pos = 9.8 : 1.0
dread = True pos : neg = 9.5 : 1.0
vocal = True pos : neg = 9.5 : 1.0
conveys = True pos : neg = 9.5 : 1.0
annual = True pos : neg = 9.5 : 1.0
slip = True pos : neg = 9.5 : 1.0

那么为什么我没有得到 pos 作为结果,为什么我总是得到 neg,即使分类器已经被正确训练?

最佳答案

问题是您将所有单词都作为特征包含在内,而“word:False”形式的特征会产生大量额外的噪音,从而淹没这些积极的特征。我查看了两个对数概率,它们非常相似:-812 与 -808。在这种问题中,通常只使用 word:True 风格的特征是合适的,因为所有其他的只会增加噪音。

我复制了你的代码,但修改了最后三行如下:

featurized_doc = {c:True for c in Example_Text.split()}
tagged_label = classifier.classify(featurized_doc)
print(tagged_label)

得到输出'pos'

关于python - NLTK 分类器在情感分析中只给出否定答案,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36998379/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com