gpt4 book ai didi

python - 如何获取验证集上的错误预测列表

转载 作者:太空宇宙 更新时间:2023-11-03 20:43:01 24 4
gpt4 key购买 nike

我正在尝试在网站评论数据库(3 类)上构建文本分类模型。我清理了 DF,将其标记化(使用 countVectorizer)和 Tfidf(TfidfTransformer)并构建了 MNB 模型。现在,在我训练和评估模型之后,我想要获得错误预测的列表,以便我可以将它们传递给 LIME 并探索混淆模型的单词。

from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import (
classification_report,
confusion_matrix,
accuracy_score,
roc_auc_score,
roc_curve,
)

df = pd.read_csv(
"https://raw.githubusercontent.com/m-braverman/ta_dm_course_data/master/train3.csv"
)
cleaned_df = df.drop(
labels=["review_id", "user_id", "business_id", "review_date"], axis=1
)

x = cleaned_df["review_text"]
y = cleaned_df["business_category"]

# tokenization
vectorizer = CountVectorizer()
vectorizer_fit = vectorizer.fit(x)
bow_x = vectorizer_fit.transform(x)

#### transform BOW to TF-IDF
transformer = TfidfTransformer()
transformer_x = transformer.fit(bow_x)
tfidf_x = transformer_x.transform(bow_x)

# SPLITTING THE DATASET INTO TRAINING SET AND TESTING SET
x_train, x_test, y_train, y_test = train_test_split(
tfidf_x, y, test_size=0.3, random_state=101
)

mnb = MultinomialNB(alpha=0.14)
mnb.fit(x_train, y_train)

predmnb = mnb.predict(x_test)

我的目标是获取模型预测错误的评论的原始索引。

最佳答案

我设法得到这样的结果:

predictions = c.predict(preprocessed_df['review_text'])
df2= preprocessed_df.join(pd.DataFrame(predictions))
df2.columns = ['review_text', 'business_category', 'word_count', 'prediction']
df2[df2['business_category']!=df2['prediction']]

我确信有一种更优雅的方式......

关于python - 如何获取验证集上的错误预测列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56754153/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com