gpt4 book ai didi

python - 使用预定义文本进行情感分析

转载 作者:太空宇宙 更新时间:2023-11-03 21:09:39 24 4
gpt4 key购买 nike

我正在使用 NLTK 在 Python 中开发一个情感分析项目。项目的输出必须显示给定的陈述是正面的还是负面的。我已经成功做到了这一点,但是如何获得中立声明的输出呢?是否可以以百分比的形式输出(即正%、负%或中性%)?

分类器.py

import random
import preprocess
import nltk

def get_classifier():
data = preprocess.get_data()
random.shuffle(data)

split = int(0.8 * len(data))

train_set = data[:split]
test_set = data[split:]

classifier = nltk.NaiveBayesClassifier.train(train_set)

accuracy = nltk.classify.util.accuracy(classifier, test_set)
print("Generated Classifier")
print('-'*70)
print("Accuracy: ", accuracy)
return classifier

预处理.py

import nltk.classify
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

stop_words = stopwords.words("english")

def create_word_features_pos(words):
useful_words = [word for word in words if word not in stop_words]
my_list = [({word: True}, 'positive') for word in useful_words]
return my_list

def create_word_features_neg(words):
useful_words = [word for word in words if word not in stop_words]
my_list = [({word: True}, 'negative') for word in useful_words]
return my_list

def create_word_features(words):
useful_words = [word for word in words if word not in stopwords.words("english")]

pos_txt = get_tokenized_file(u"positive-words.txt")
neg_txt = get_tokenized_file(u"negative-words.txt")

my_dict = dict([(word, True) for word in pos_txt if word in useful_words])
my_dict1 = dict([(word, False) for word in neg_txt if word in useful_words])
my_dict3 = dict([word,])
my_dict.update(my_dict1)

return my_dict

def get_tokenized_file(file):
return word_tokenize(open(file, 'r').read())

def get_data():
print("Collecting Negative Words")
neg_txt = get_tokenized_file(u"negative-words.txt")
neg_features = create_word_features_neg(neg_txt)

print("Collecting Positive Words")
pos_txt = get_tokenized_file(u"positive-words.txt")
pos_features = create_word_features_pos(pos_txt)
return pos_features + neg_features

def process(data):
return [word.lower() for word in word_tokenize(data)]

最佳答案

nltk.NaiveBayesClassifier.train 的文档:

Parameters: labeled_featuresets – A list of classified featuresets, i.e., a list of tuples (featureset, label).

这意味着您的train_set是一组(features, label)元组。

如果您想添加中性类型,则需要将某些数据标记为中性,否则分类器无法学习这种新类型。

现在,您将数据标记为:(word, True)(word, False),切换到 3 个标签的示例是 (word ,0)(字,1)(字,2)

nltk.NaiveBayesClassifier.prob_classify 将返回每个标签的概率。

可以在此处找到文档:https://www.nltk.org/api/nltk.classify.html#nltk.classify.naivebayes.NaiveBayesClassifier

关于python - 使用预定义文本进行情感分析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55154923/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com