gpt4 book ai didi

python - 计算多类的sklearn.roc_auc_score

转载 作者:太空狗 更新时间:2023-10-29 17:13:59 27 4
gpt4 key购买 nike

我想计算我的分类器的 AUC、精确度和准确度。我在做监督学习:

这是我的工作代码。此代码适用于二进制类,但不适用于多类。请假设您有一个包含二进制类的数据框:

sample_features_dataframe = self._get_sample_features_dataframe()
labeled_sample_features_dataframe = retrieve_labeled_sample_dataframe(sample_features_dataframe)
labeled_sample_features_dataframe, binary_class_series, multi_class_series = self._prepare_dataframe_for_learning(labeled_sample_features_dataframe)

k = 10
k_folds = StratifiedKFold(binary_class_series, k)
for train_indexes, test_indexes in k_folds:
train_set_dataframe = labeled_sample_features_dataframe.loc[train_indexes.tolist()]
test_set_dataframe = labeled_sample_features_dataframe.loc[test_indexes.tolist()]

train_class = binary_class_series[train_indexes]
test_class = binary_class_series[test_indexes]
selected_classifier = RandomForestClassifier(n_estimators=100)
selected_classifier.fit(train_set_dataframe, train_class)
predictions = selected_classifier.predict(test_set_dataframe)
predictions_proba = selected_classifier.predict_proba(test_set_dataframe)

roc += roc_auc_score(test_class, predictions_proba[:,1])
accuracy += accuracy_score(test_class, predictions)
recall += recall_score(test_class, predictions)
precision += precision_score(test_class, predictions)

最后,我当然将结果除以 K,以获得平均 AUC、精度等。这段代码工作正常。但是,我无法为多类计算相同的值:

    train_class = multi_class_series[train_indexes]
test_class = multi_class_series[test_indexes]

selected_classifier = RandomForestClassifier(n_estimators=100)
selected_classifier.fit(train_set_dataframe, train_class)

predictions = selected_classifier.predict(test_set_dataframe)
predictions_proba = selected_classifier.predict_proba(test_set_dataframe)

我发现对于多类我必须为平均值添加参数“weighted”。

    roc += roc_auc_score(test_class, predictions_proba[:,1], average="weighted")

我得到一个错误:raise ValueError("{0} format is not supported".format(y_type))

ValueError:不支持多类格式

最佳答案

您不能将 roc_auc 用作多类模型的单个汇总指标。如果你愿意,你可以计算每类 roc_auc,如

roc = {label: [] for label in multi_class_series.unique()}
for label in multi_class_series.unique():
selected_classifier.fit(train_set_dataframe, train_class == label)
predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
roc[label] += roc_auc_score(test_class, predictions_proba[:,1])

然而,更常见的是使用 sklearn.metrics.confusion_matrix 来评估多类模型的性能。

关于python - 计算多类的sklearn.roc_auc_score,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39685740/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com