gpt4 book ai didi

python - DecisionTreeClassifier 的精确召回曲线下的面积是一个正方形

转载 作者:行者123 更新时间:2023-11-30 09:06:57 26 4
gpt4 key购买 nike

我正在使用 scikit-learn 中的 DecisionTreeClassifier 对一些数据进行分类。我还使用其他算法,并使用精确召回指标下的面积来比较它们。问题是 DecisionTreeClassifier 的 AUPRC 形状是正方形,而不是您对该指标期望的通常形状。

以下是我计算 DecisionTreeClassifier 的 AUPRC 的方法。我在计算时遇到了一些麻烦,因为 DecisionTreeClassifer 没有像 LogisticRegression 这样的其他分类器那样的 decision_function()

这些是我得到的 SVM、Logistic 回归和 DecisionTreeClassifier 的 AUPRC 结果

这是我计算 DecisionTreeClassifier 的 AUPRC 的方法

def execute(X_train, y_train, X_test, y_test):
tree = DecisionTreeClassifier(class_weight='balanced')
tree_y_score = tree.fit(X_train, y_train).predict(X_test)

tree_ap_score = average_precision_score(y_test, tree_y_score)

precision, recall, _ = precision_recall_curve(y_test, tree_y_score)
values = {'ap_score': tree_ap_score, 'precision': precision, 'recall': recall}
return values

以下是我计算 SVM 的 AUPRC 的方法:

def execute(X_train, y_train, X_test, y_test):
svm = SVC(class_weight='balanced')
svm.fit(X_train, y_train.values.ravel())
svm_y_score = svm.decision_function(X_test)

svm_ap_score = average_precision_score(y_test, svm_y_score)

precision, recall, _ = precision_recall_curve(y_test, svm_y_score)
values = {'ap_score': svm_ap_score, 'precision': precision, 'recall': recall}
return values

以下是我计算 LogisticRegression 的 AUPRC 的方法:

def execute(X_train, y_train, X_test, y_test):
lr = LogisticRegression(class_weight='balanced')
lr.fit(X_train, y_train.values.ravel())
lr_y_score = lr.decision_function(X_test)

lr_ap_score = average_precision_score(y_test, lr_y_score)

precision, recall, _ = precision_recall_curve(y_test, lr_y_score)
values = {'ap_score': lr_ap_score, 'precision': precision, 'recall': recall}
return values

然后我将它们称为方法并绘制结果,如下所示:

import LogReg_AP_Harness as lrApTest
import SVM_AP_Harness as svmApTest
import DecTree_AP_Harness as dtApTest
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
import matplotlib.pyplot as plt


def do_work(df):
X = df.ix[:, df.columns != 'Class']
y = df.ix[:, df.columns == 'Class']

y_binarized = label_binarize(y, classes=[0, 1])
n_classes = y_binarized.shape[1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, random_state=0)

_, _, y_train_binarized, y_test_binarized = train_test_split(X, y_binarized, test_size=.3, random_state=0)

print('Executing Logistic Regression')
lr_values = lrApTest.execute(X_train, y_train, X_test, y_test)
print('Executing Decision Tree')
dt_values = dtApTest.execute(X_train, y_train_binarized, X_test, y_test_binarized)
print('Executing SVM')
svm_values = svmApTest.execute(X_train, y_train, X_test, y_test)

plot_aupr_curves(lr_values, svm_values, dt_values)


def plot_aupr_curves(lr_values, svm_values, dt_values):
lr_ap_score = lr_values['ap_score']
lr_precision = lr_values['precision']
lr_recall = lr_values['recall']

svm_ap_score = svm_values['ap_score']
svm_precision = svm_values['precision']
svm_recall = svm_values['recall']

dt_ap_score = dt_values['ap_score']
dt_precision = dt_values['precision']
dt_recall = dt_values['recall']

plt.step(svm_recall, svm_precision, color='g', alpha=0.2,where='post')
plt.fill_between(svm_recall, svm_precision, step='post', alpha=0.2, color='g')

plt.step(lr_recall, lr_precision, color='b', alpha=0.2, where='post')
plt.fill_between(lr_recall, lr_precision, step='post', alpha=0.2, color='b')

plt.step(dt_recall, dt_precision, color='r', alpha=0.2, where='post')
plt.fill_between(dt_recall, dt_precision, step='post', alpha=0.2, color='r')

plt.xlabel('Recall')
plt.ylabel('Precision')
plt.ylim([0.0, 1.05])
plt.xlim([0.0, 1.0])
plt.title('SVM (Green): Precision-Recall curve: AP={0:0.2f}'.format(svm_ap_score) + '\n' +
'Logistic Regression (Blue): Precision-Recall curve: AP={0:0.2f}'.format(lr_ap_score) + '\n' +
'Decision Tree (Red): Precision-Recall curve: AP={0:0.2f}'.format(dt_ap_score))
plt.show()

do_work() 方法中,我必须对 y 进行二值化,因为 DecisionTreeClassifier 没有 descision_function()。我采用了 here 的方法.

这是情节:

AUPRC Plot

我想归根结底是我错误地计算了 DecisionTreeClassifier 的 AUPRC。

最佳答案

对于DecisionTreeClassifier,将predict替换为pred_proba;后者的作用与决策函数相同。

关于python - DecisionTreeClassifier 的精确召回曲线下的面积是一个正方形,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49632828/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com