gpt4 book ai didi

python - 使用 sklearn xgboost gridsearchcv 的多个评分指标

转载 作者:行者123 更新时间:2023-12-01 09:24:35 24 4
gpt4 key购买 nike

如何使用 sklearn xgboost 运行网格搜索并获取各种指标(最好是 F1 阈值)?

请参阅下面的代码...找不到我做错了什么/不理解错误..

######################### just making up a dataset here##############
from sklearn import datasets

from sklearn.metrics import precision_score, recall_score, accuracy_score, roc_auc_score, make_scorer
from sklearn.calibration import CalibratedClassifierCV, calibration_curve
from sklearn.model_selection import train_test_split
from sklearn.grid_search import RandomizedSearchCV

import xgboost as xgb

X, y = datasets.make_classification(n_samples=100000, n_features=20,
n_informative=2, n_redundant=10,
random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.99,
random_state=42)

剩下的就是一堆参数,然后是随机网格搜索......如果我将“SCORING_EVALS”更改为“roc_auc”,那么它会起作用...如果我尝试执行似乎是记录在案的方法,我会收到错误?我哪里错了?

此外,我如何确保这些指标在 F1 阈值下报告!?

params = {
'min_child_weight': [0.5, 1.0, 3.0, 5.0, 7.0, 10.0],
'gamma': [0, 0.25, 0.5, 1.0],
'reg_lambda': [0.1, 1.0, 5.0, 10.0, 50.0, 100.0],
"max_depth": [2,4,6,10],
"learning_rate": [0.05,0.1, 0.2, 0.3,0.4],
"colsample_bytree":[1, .8, .5],
"subsample": [0.8],
'reg_lambda': [0.1, 1.0, 5.0, 10.0, 50.0, 100.0],
'n_estimators': [50]
}


folds = 5
max_models = 5

scoring_evals = {'AUC': 'roc_auc', 'Accuracy': make_scorer(accuracy_score), 'Precision': make_scorer(precision_score),'Recall': make_scorer(recall_score)}


xgb_algo = xgb.XGBClassifier()
random_search = RandomizedSearchCV(xgb_algo,
param_distributions=params, n_iter=max_models,
scoring= scoring_evals, n_jobs=4, cv=5, verbose=False, random_state=2018 )

random_search.fit(X_train, y_train)

我的错误是:

ValueError: scoring value should either be a callable, string or None. {'AUC': 'roc_auc', 'Accuracy': make_scorer(accuracy_score), 'Precision': make_scorer(precision_score), 'Recall': make_scorer(recall_score)} was passed

最佳答案

首先检查您正在使用的 scikit-learn 版本。如果它是 v0.19 ,那么您正在使用已弃用的模块。

你正在这样做:

from sklearn.grid_search import RandomizedSearchCV

您一定收到过如下警告:

DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. ... ... ...

grid_search 模块中的类已过时且已弃用,并且不包含您正在使用的多指标功能。

注意该警告并执行以下操作:

from sklearn.model_selection import RandomizedSearchCV

...
...
...

random_search = RandomizedSearchCV(xgb_algo,
param_distributions=params,
n_iter=max_models,
scoring= scoring_evals, n_jobs=4, cv=5,
verbose=False, random_state=2018, refit=False )

现在仔细查看 refit 参数。在多指标设置中,您需要对其进行设置,以便最终模型能够与其拟合,因为模型的最佳超参数将仅根据单个指标来决定。

如果您不想要最终模型并且只想要模型在数据和不同参数上的性能,则可以将其设置为False,或者将其设置为任何key 你的评分字典中有。

关于python - 使用 sklearn xgboost gridsearchcv 的多个评分指标,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50537651/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com