gpt4 book ai didi

python - 如何获得分类模型的预测概率?

转载 作者:太空宇宙 更新时间:2023-11-04 02:02:49 26 4
gpt4 key购买 nike

我正在尝试使用二元因变量(占用/未占用)的不同分类模型。我感兴趣的模型是逻辑回归、决策树和高斯朴素贝叶斯。

我的输入数据是一个 csv 文件,其中包含日期时间索引(例如 2019-01-07 14:00)、三个可变列(“R”、“P”、“C”,包含数值),以及因变量列(“值”,包含二进制值)。

训练模型不是问题,一切正常。所有的模型都以二进制值的形式给出了我的预测(这当然应该是最终结果),但我也希望看到预测的概率,这些概率使他们决定了其中一个二进制值。有什么办法也可以获得这些值吗?

我已经尝试了所有与 yellowbrick 包一起工作的分类可视化工具(ClassBalance、ROCAUC、ClassificationReport、ClassPredictionError)。但是所有这些都没有给我一张图表来显示模型为数据集计算出的概率。

import pandas as pd
import numpy as np
data = pd.read_csv('testrooms_data.csv', parse_dates=['timestamp'])


from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

##split dataset into test and trainig set
X = data.drop("value", axis=1) # X contains all the features
y = data["value"] # y contains only the label

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.5, random_state = 1)

###model training
###Logistic Regression###
clf_lr = LogisticRegression()

# fit the dataset into LogisticRegression Classifier

clf_lr.fit(X_train, y_train)
#predict on the unseen data
pred_lr = clf_lr.predict(X_test)

###Decision Tree###

from sklearn.tree import DecisionTreeClassifier

clf_dt = DecisionTreeClassifier()
pred_dt = clf_dt.fit(X_train, y_train).predict(X_test)

###Bayes###
from sklearn.naive_bayes import GaussianNB

bayes = GaussianNB()
pred_bayes = bayes.fit(X_train, y_train).predict(X_test)


###visualization for e.g. LogReg
from yellowbrick.classifier import ClassificationReport
from yellowbrick.classifier import ClassPredictionError
from yellowbrick.classifier import ROCAUC

#classificationreport
visualizer = ClassificationReport(clf_lr, support=True)

visualizer.fit(X_train, y_train) # Fit the visualizer and the model
visualizer.score(X_test, y_test) # Evaluate the model on the test data
g = visualizer.poof() # Draw/show/poof the data

#classprediction report
visualizer2 = ClassPredictionError(LogisticRegression())

visualizer2.fit(X_train, y_train) # Fit the training data to the visualizer
visualizer2.score(X_test, y_test) # Evaluate the model on the test data
g2 = visualizer2.poof() # Draw visualization

#(ROC)
visualizer3 = ROCAUC(LogisticRegression())

visualizer3.fit(X_train, y_train) # Fit the training data to the visualizer
visualizer3.score(X_test, y_test) # Evaluate the model on the test data
g3 = visualizer3.poof() # Draw/show/poof the data


如果有例如类似于 pred_lr 的数组,其中包含为 csv 文件的每一行计算的概率。那可能吗?如果是,我怎样才能得到它?

最佳答案

在大多数 sklearn 估计器(如果不是全部)中,您都有一种方法来获取排除分类的概率,无论是对数概率还是概率。

例如,如果您有朴素贝叶斯分类器并且您想要获得概率而不是分类本身,您可以这样做(我在您的代码中使用了相同的命名法):

from sklearn.naive_bayes import GaussianNB

bayes = GaussianNB()
pred_bayes = bayes.fit(X_train, y_train).predict(X_test)

#for probabilities
bayes.predict_proba(X_test)
bayes.predict_log_proba(X_test)

希望这对您有所帮助。

关于python - 如何获得分类模型的预测概率?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55395874/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com