python - 与 xgboost.cv 相比，GridSearchCV 未给出与预期相同的结果-6ren

python - 与 xgboost.cv 相比，GridSearchCV 未给出与预期相同的结果

转载作者：太空狗更新时间：2023-10-30 01:37:02

26

4

将 sklearn.GridSearchCV 与 xgboost.cv 进行比较时，我得到了不同的结果......下面我将解释我想做什么:

1)导入库

import numpy as np
from sklearn import datasets
import xgboost as xgb
from sklearn.model_selection import GridSearchCV
from xgboost.sklearn import XGBClassifier
from sklearn.model_selection import StratifiedKFold

2) 设置种子和折叠

seed = 5
n_fold_inner = 5
skf_inner               = StratifiedKFold(n_splits=n_fold_inner,random_state=seed, shuffle=True)

3) 加载数据集

X, y = datasets.make_hastie_10_2(n_samples=12000, random_state=1)
X = X.astype(np.float32)

# map labels from {-1, 1} to {0, 1}
labels, y = np.unique(y, return_inverse=True)

X_train, X_test = X[:2000], X[2000:]
y_train, y_test = y[:2000], y[2000:]
dtrain  = xgb.DMatrix(X_train,  label=y_train, missing = np.nan)

4) 定义参数xgboost

fixed_parameters = {
               'max_depth':3,
               'min_child_weight':3,
               'learning_rate':0.3,
               'colsample_bytree':0.8,
               'subsample':0.8,
               'gamma':0,
               'max_delta_step':0,
               'colsample_bylevel':1,
               'reg_alpha':0,
               'reg_lambda':1,
               'scale_pos_weight':1,
               'base_score':0.5,
               'seed':5,
               'objective':'binary:logistic',
               'silent': 1}

5) 我进行网格搜索的参数(只有一个，即估计器的数量)

params_grid = {
               'n_estimators':np.linspace(1, 20, 20).astype('int')
               }

6) 执行网格搜索

bst_grid = GridSearchCV(
            estimator=XGBClassifier(**fixed_parameters),param_grid=params_grid,n_jobs=4,
            cv=skf_inner,scoring='roc_auc',iid=False,refit=False,verbose=1)

bst_grid.fit(X_train,y_train)

best_params_grid_search = bst_grid.best_params_
best_score_grid_search = bst_grid.best_score_


means_train = bst_grid.cv_results_['mean_train_score']
stds_train = bst_grid.cv_results_['std_train_score']
means_test = bst_grid.cv_results_['mean_test_score']
stds_test = bst_grid.cv_results_['std_test_score']

7)打印结果

print('\ntest-auc-mean  test-auc-std  train-auc-mean  train-auc-std')
for idx in range(0, len(means_test)):
    print means_test[idx], stds_test[idx], means_train[idx], stds_train[idx]

8) 现在我使用与之前相同的参数运行 xgb.cv 20 轮(我之前作为网格搜索输入的 n_estimators。问题是我得到不同的结果...

num_rounds = 20
best_params_grid_search['objective']= 'binary:logistic'
best_params_grid_search['silent']= 1
cv_xgb = xgb.cv(best_params_grid_search,dtrain,num_boost_round =num_rounds,folds=skf_inner,metrics={'auc'},seed=seed,maximize=True)
print(cv_xgb)

RESULT GRIDSEARCH(每行使用 n 个估计器(1,2,3,...,20)

test-auc-mean  test-auc-std  train-auc-mean  train-auc-std
0.610051313783 0.0161039540435 0.644057288587 0.0113345992869
0.69201880047 0.0162563563448 0.736006666658 0.00692672815659
0.745466211655 0.0171675737271 0.796345885396 0.00696679302744
0.783959748994 0.00705320521545 0.841463145757 0.00948465661336
0.814666429161 0.0205663250121 0.876016226998 0.00594191823748
0.834757856446 0.0380407635359 0.89839145346 0.0119466187041
0.846589877247 0.0250769570711 0.918506450202 0.00400934458132
0.856519550489 0.02076405634 0.929968936282 0.00287173282935
0.874262106553 0.0270140215944 0.940190511945 0.00335749381638
0.884796282407 0.0242102758081 0.947369708661 0.00274634034559
0.890833683342 0.0240690598159 0.953708404754 0.00332080069217
0.898287157179 0.0212975975614 0.958794323829 0.00463360376002
0.905931348284 0.0240526927266 0.963055575138 0.00385161158711
0.911782932073 0.0169788764956 0.966542306102 0.00274612227499
0.912551138778 0.0175200936415 0.969060984867 0.00135518880398
0.915046588665 0.0169918459539 0.971904231381 0.00177694652262
0.917921423036 0.0131486037603 0.975162276052 0.0025983006922
0.921909172729 0.0113192686772 0.976056924526 0.0022670828819
0.928131653291 0.0117709832599 0.978585868159 0.00211167800105
0.931493562339 0.0119475329984 0.98098486872 0.00186032225868

结果 XGB.CV

    test-auc-mean  test-auc-std  train-auc-mean  train-auc-std
0        0.669881      0.013938        0.772116       0.011315
1        0.759682      0.019225        0.883394       0.004381
2        0.798337      0.016992        0.939274       0.005196
3        0.827751      0.007224        0.962461       0.007382
4        0.850340      0.011451        0.978809       0.001102
5        0.864438      0.020012        0.986584       0.000858
6        0.879706      0.014168        0.991765       0.001926
7        0.889308      0.013851        0.994663       0.000970
8        0.897973      0.011383        0.996704       0.000481
9        0.903878      0.012139        0.997494       0.000432
10       0.909599      0.010234        0.998301       0.000602
11       0.912682      0.014475        0.998972       0.000306
12       0.914289      0.014122        0.999392       0.000207
13       0.916273      0.011744        0.999568       0.000185
14       0.918050      0.011219        0.999718       0.000140
15       0.922161      0.011968        0.999788       0.000146
16       0.922990      0.010124        0.999863       0.000085
17       0.924221      0.009026        0.999893       0.000082
18       0.925718      0.008859        0.999929       0.000060
19       0.926104      0.007586        0.999959       0.000030

最佳答案

num_boost_round 是提升迭代次数(即 n_estimators)。 XGBoost.cv 将忽略参数中的 n_estimators 并用 num_boost_round 覆盖它。

试试这个:

cv_xgb = xgb.cv(best_params_grid_search,dtrain,num_boost_round =best_params_grid_search['n_estimators'],folds=skf_inner,metrics={'auc'},seed=seed,maximize=True)

关于python - 与 xgboost.cv 相比，GridSearchCV 未给出与预期相同的结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41939144/

26

4

0

文章推荐： python - 从 Pandas DataFrame 列标题中获取列表

文章推荐： sql - 最喜欢的性能调优技巧

文章推荐： mdf - 什么是 .MDF 文件？

xgboost - xgboost 模型的内部节点预测
是否可以计算 xgboost 模型的内部节点预测？ R 包 gbm 提供了对每棵树的内部节点的预测。然而，xgboost 输出仅显示对模型最后一片叶子的预测。 xgboost 输出: 请注意，质量列
xgboost - XGBoost 中多类分类的损失函数是什么？
我想知道哪个损失函数使用 XGBoost 进行多类分类。我找到了 in this question二元情况下逻辑分类的损失函数。我认为对于多类情况，它可能与 GBM 中的相同(对于 K 类)whic
xgboost - XGBoost 如何进行并行计算？
XGBoost 使用加法训练的方法，在该方法中对先前模型的残差进行建模。虽然这是顺序的，那么它如何并行计算呢？最佳答案 Xgboost 不会像您提到的那样并行运行多棵树，您需要在每棵树之后进行预测
xgboost - 在这个 XGBoost 树中如何计算休假分数？
我正在看下面的图片。有人可以解释一下它们是如何计算的吗？我以为 N 是 -1，是 +1，但后来我不明白这个小女孩怎么有 0.1。但这对于树 2 也不起作用。最佳答案我同意@user1808924
xgboost - Sagemaker 中 XGBoost 的功能重要性
我已经使用 Amazon Sagemaker 构建了一个 XGBoost 模型，但是我找不到任何可以帮助我解释模型并验证它是否学习了正确的依赖关系的东西。通常，我们可以通过 python API (
r - 使用 xgboost 函数时出现 XGBoost 错误
这是我的代码: xgb <- xgboost(data = as.matrix(df_all_combined), label = as.matrix(target_tr
xgboost - 梯度提升过程 (xgboost) 中如何使用参数 "weight"(DMatrix)？
在 xgboost 中可以设置参数 weight对于 DMatrix .这显然是一个权重列表，其中每个值都是相应样本的权重。我找不到有关这些权重如何在梯度提升过程中实际使用的任何信息。他们是否与 e
xgboost - 如何在 jupyter 中隐藏来自 xgboost 库的警告？
不工作: import warnings warnings.filterwarnings('ignore') 我得到的警告: [14:24:45] WARNING: C:/Jenkins/worksp
python - 如何在没有 XGBoost 库的情况下生成 XGBoost 输出？
我有一个用 Python 训练的 XGBoost 二元分类器模型。我想在不同的脚本环境 (MQL4) 中使用纯数学运算而不使用 XGBoost 库 (.predict) 从该模型生成新输入数据的输出
xgboost - 将 Azure AutoML 与 XGBoost 分类器一起用于分类数据时出现奇怪的算法选择
我有一个仅包含分类特征和分类标签的数据模型。因此，当我在 XGBoost 中手动构建该模型时，我基本上会将特征转换为二进制列(使用 LabelEncoder 和 OneHotEncoder)，并使用
xgboost - 使用 'rank:pairwise' 的 XGboost 的输出是什么？
我使用 XGBoost 的 python 实现。目标之一是rank:pairwise并且最小化成对损失( Documentation )。但是，它没有说明输出的范围。我看到 -10 到 10 之间的数
xgboost - hyperopt 结果超出了我的 hp.choice 限制，为什么？ (XGBoost)
我遇到了一个奇怪的问题: 我通过 hyperopt 定义了我的 XGB 超参数 'max_depth' hp.choice('max_depth',range(2,20)) 但我得到了 'max_de
r - “xgboost” 官方包与 R 中 "caret"包的 xgboost 的不同结果
我是 R 编程语言新手，我需要运行“xgboost”进行一些实验。问题是我需要交叉验证模型并获得准确性，我发现两种方法可以给我不同的结果: 使用“插入符号”: library(mlbench) lib
xgboost - 对于 XGBoost 二进制分类问题，选择 auc/error/logloss 作为 eval_metric 有什么影响？
选择 auc、error 或 logloss 作为 XGBoost 的 eval_metric 对其性能有何影响？假设数据不平衡。它如何影响准确度、召回率和精确度？最佳答案在不同的评估矩阵之间进
python - 如何使用 XGBoost 获取 Predictions 和使用 Scikit-Learn Wrapper 的 XGBoost 进行匹配？
我是 Python 中 XGBoost 的新手，所以如果这里的答案很明显，我深表歉意，但我正在尝试使用 panda 数据框并在 Python 中获取 XGBoost 来给我使用 Scikit-Lear
xgboost - 如何在xgboost的多类分类中为不平衡数据设置权重？
我知道您可以为不平衡的数据集设置 scale_pos_weight。然而，如何处理不平衡数据集中的多分类问题。我经历过https://datascience.stackexchange.com/que
python - xgboost 预测对概率的贡献
我正在使用 xgboost 的功能 pred_contribs 以便为我的模型的每个样本获得某种可解释性(shapley 值)。 booster.predict(test, pred_contribs
Xgboost cox 生存时间输入
在 xgboost 0.81 中 cox ph 生存模型的新实现中，如何指定事件的开始和结束时间？谢谢例如，R 等效函数是: cph_mod = coxph(Surv(Start, Stop, S
r - xgboost，抵消曝光？
我正在 R 中建模 claim 频率(泊松分布)。我正在使用 gbm和 xgboost包，但似乎xgboost没有将曝光考虑在内的偏移参数？在 gbm ，人们会按如下方式考虑暴露: gbm.fit(
r - xgboost 包和随机森林回归
xgboost 包允许构建一个随机森林(实际上，它选择列的随机子集来为整棵树的 split 选择一个变量，而不是为了点头，因为它是算法的经典版本，但它可以忍受)。但是对于回归，似乎只使用了森林中的一棵

首页

博学

6Ren·AI

商城

python - 与 xgboost.cv 相比，GridSearchCV 未给出与预期相同的结果