gpt4 book ai didi

python - 在 sklearn 中调整超参数后查找模型的准确性、精确度和召回率

转载 作者:行者123 更新时间:2023-11-30 22:08:19 24 4
gpt4 key购买 nike

我有一个二元分类问题,为此我选择了 3 种算法:Logistic 回归、SVM 和 Adaboost。我对每个参数使用网格搜索和 k 折交叉验证来找到最佳的超参数集。现在,根据准确度、精确度和召回率,我需要选择最佳模型。但问题是我无法找到任何合适的方法来检索这些信息。我的代码如下:

from sklearn.model_selection import GridSearchCV
from sklearn.metrics.scorer import make_scorer
from sklearn import cross_validation

# TODO: Initialize the classifier
clfr_A = LogisticRegression(random_state=128)
clfr_B = SVC(random_state=128)
clfr_C = AdaBoostClassifier(random_state=128)

lr_param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000] }
svc_param_grid = {'C': [0.001, 0.01, 0.1, 1, 10], 'gamma' : [0.001, 0.01, 0.1, 1]}
adb_param_grid = {'n_estimators' : [50,100,150,200,250,500],'learning_rate' : [.5,.75,1.0,1.25,1.5,1.75,2.0]}

# TODO: Make an fbeta_score scoring object using make_scorer()
scorer = make_scorer(fbeta_score, beta = 0.5)

# TODO: Perform grid search on the classifier using 'scorer' as the scoring method using GridSearchCV()
clfrs = [clfr_A, clfr_B, clfr_C]
params = [lr_param_grid, svc_param_grid, adb_param_grid]

for clfr, param in zip(clfrs, params):
grid_obj = GridSearchCV(clfr, param, cv=3, scoring=scorer, refit=True)
grid_fit = grid_obj.fit(features_raw, target_raw)
print grid_fit.best_estimator_
print grid_fit.cv_results_

问题是cv_results_给出了很多信息,但除了mean_test_score之外我找不到任何相关的信息。此外,我在那里没有看到任何与准确性、精确度或召回率相关的指标。

我可以想到一种方法来实现它。我可以将 for 循环更改为如下所示:

score_params = ["accuracy", "precision", "recall"]
for clfr, param in zip(clfrs, params):
grid_obj = GridSearchCV(clfr, param, cv=3, scoring=scorer, refit=True)
grid_fit = grid_obj.fit(features_raw, target_raw)
best_clf = grid_fit.best_estimator_
for score in score_params:
print score,
print " : ",
print cross_val_score(best_clf, features_raw, target_raw, scoring=score, cv=3).mean()

但是有没有更好的方法呢?看来我对每个模型进行了多次操作。如有任何帮助,我们将不胜感激。

最佳答案

GridSearchCV 正在做你给出的事情。您将 f_beta 作为评分器,因此 mean_test_score 将返回每个参数组合的 f_beta 结果。如果您想访问其他指标,您需要明确告诉 GridSearchCV 这样做。

较新版本的 scikit-learn 中的 GridSearchCV 支持多指标评分。所以你可以传球给多种类型的得分手。如per documentation :

scoring : string, callable, list/tuple, dict or None, default: None

... ...

For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values.

请参阅此处的示例:

并将您的scoring参数更改为:

scoring = {'Accuracy': 'accuracy', 
'FBeta': make_scorer(fbeta_score, beta = 0.5),
# ... Add others here as you want.
}

但是现在当你这样做时,你还需要更改 refit 参数。由于此处不同的指标将为参数组合提供不同类型的分数,因此您需要在重新拟合估计器时决定选择哪一个。因此,从评分字典中选择一个键进行refit

for clfr, param in zip(clfrs, params):
grid_obj = GridSearchCV(clfr, param, cv=3, scoring=scorer, refit='FBeta')
...
...

关于python - 在 sklearn 中调整超参数后查找模型的准确性、精确度和召回率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52196422/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com