gpt4 book ai didi

python - 两个文件之间的Classification_report

转载 作者:行者123 更新时间:2023-11-30 08:58:00 25 4
gpt4 key购买 nike

我正在尝试在两个文件之间进行评分。两者具有相同的数据,但标签不同。训练数据中的标签是正确的,而测试数据中的标签不一定正确......我想知道准确性、召回率和 f 分数。

import pandas
import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import precision_score, recall_score, confusion_matrix, classification_report, accuracy_score, f1_score

df_train = pd.read_csv('train.csv', sep = ',')
df_test = pd.read_csv('teste.csv', sep = ',')

vec_train = TfidfVectorizer()
X_train = vec_train.fit_transform(df_train['text'])
y_train = df_train['label']

vec_test = TfidfVectorizer()
X_test = vec_test.fit_transform(df_train['text'])
y_test = df_test['label']

clf = LogisticRegression(penalty='l2', multi_class = 'multinomial',solver ='newton-cg')

y_pred = clf.predict(X_test)

print ("Accuracy on training set:")
print (clf.score(X_train, y_train))
print ("Accuracy on testing set:")
print (clf.score(X_test, y_test))
print ("Classification Report:")
print (metrics.classification_report(y_test, y_pred))

一个愚蠢的数据示例:

TRAIN
text,label
dogs are cool,animal
flowers are beautifil,plants
pen is mine,objet
beyonce is an artist,person

TEST
text,label
dogs are cool,objet
flowers are beautifil,plants
pen is mine,person
beyonce is an artist,animal

错误:

Traceback (most recent call last):

File "accuracy.py", line 30, in y_pred = clf.predict(X_test)

File "/usr/lib/python3/dist-packages/sklearn/linear_model/base.py", line 324, in predict scores = self.decision_function(X)

File "/usr/lib/python3/dist-packages/sklearn/linear_model/base.py", line 298, in decision_function "yet" % {'name': type(self).name}) sklearn.exceptions.NotFittedError: This LogisticRegression instance is not fitted yet

我只是想计算测试的准确性

最佳答案

您正在测试数据上拟合新的 TfidfVectorizer。这会给出错误的结果。您应该使用与训练数据相同的对象。

这样做:

vec_train = TfidfVectorizer()
X_train = vec_train.fit_transform(df_train['text'])

X_test = vec_train.transform(df_test['text'])

之后,正如 @MohammedKashif 所说,您需要首先训练您的 LogisticRegression 模型,然后在测试中进行预测。

clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

之后您就可以使用评分代码而不会出现任何错误。

关于python - 两个文件之间的Classification_report,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52261948/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com