gpt4 book ai didi

python - 面对 ValueError : Target is multiclass but average ='binary'

转载 作者:太空狗 更新时间:2023-10-30 02:36:48 31 4
gpt4 key购买 nike

我是 Python 和机器学习的新手。根据我的要求,我正在尝试对我的数据集使用朴素贝叶斯算法。

我能够找出准确度,但试图找出精确度和召回率。但是,它抛出以下错误:

ValueError: Target is multiclass but average='binary'. Please choose another average setting.

任何人都可以建议我如何进行。我尝试在精度和召回分数中使用 average ='micro'。它没有任何错误,但它在准确性、精度和召回方面给出了相同的分数。

我的数据集:

train_data.csv:

review,label
Colors & clarity is superb,positive
Sadly the picture is not nearly as clear or bright as my 40 inch Samsung,negative

测试数据.csv:

review,label
The picture is clear and beautiful,positive
Picture is not clear,negative

我的代码:

from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import confusion_matrix
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score


def load_data(filename):
reviews = list()
labels = list()
with open(filename) as file:
file.readline()
for line in file:
line = line.strip().split(',')
labels.append(line[1])
reviews.append(line[0])

return reviews, labels

X_train, y_train = load_data('/Users/abc/Sep_10/train_data.csv')
X_test, y_test = load_data('/Users/abc/Sep_10/test_data.csv')

vec = CountVectorizer()

X_train_transformed = vec.fit_transform(X_train)

X_test_transformed = vec.transform(X_test)

clf= MultinomialNB()
clf.fit(X_train_transformed, y_train)

score = clf.score(X_test_transformed, y_test)
print("score of Naive Bayes algo is :" , score)

y_pred = clf.predict(X_test_transformed)
print(confusion_matrix(y_test,y_pred))

print("Precision Score : ",precision_score(y_test,y_pred,pos_label='positive'))
print("Recall Score :" , recall_score(y_test, y_pred, pos_label='positive') )

最佳答案

您需要添加 'average' 参数。根据the documentation :

average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’]

This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:

这样做:

print("Precision Score : ",precision_score(y_test, y_pred, 
pos_label='positive'
average='micro'))
print("Recall Score : ",recall_score(y_test, y_pred,
pos_label='positive'
average='micro'))

'micro' 替换为上述任一选项,但 'binary' 除外。此外,在多类设置中,无需提供 'pos_label',因为它无论如何都会被忽略。

评论更新:

是的,它们可以相等。它在 user guide here 中给出:

Note that for “micro”-averaging in a multiclass setting with all labels included will produce equal precision, recall and F, while “weighted” averaging may produce an F-score that is not between precision and recall.

关于python - 面对 ValueError : Target is multiclass but average ='binary' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52269187/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com