gpt4 book ai didi

python - XGBoost算法,关于evaulate_model函数的问题

转载 作者:太空宇宙 更新时间:2023-11-04 04:20:32 24 4
gpt4 key购买 nike

这个evaulate模型函数用的很频繁,我发现用到了here在 IBM。但我会在这里展示这个功能:

def evaluate_model(alg, train, target, predictors, useTrainCV=True , cv_folds=5, early_stopping_rounds=50):

if useTrainCV:
xgb_param = alg.get_xgb_params()
xgtrain = xgb.DMatrix(train[predictors].values, target['Default Flag'].values)
cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=alg.get_params()['n_estimators'], nfold=cv_folds,
metrics='auc', early_stopping_rounds=early_stopping_rounds, verbose_eval=True)
alg.set_params(n_estimators=cvresult.shape[0])

#Fit the algorithm on the data
alg.fit(train[predictors], target['Default Flag'], eval_metric='auc')

#Predict training set:
dtrain_predictions = alg.predict(train[predictors])
dtrain_predprob = alg.predict_proba(train[predictors])[:,1]

#Print model report:
print("\nModel Report")
print("Accuracy : %.6g" % metrics.accuracy_score(target['Default Flag'].values, dtrain_predictions))
print("AUC Score (Train): %f" % metrics.roc_auc_score(target['Default Flag'], dtrain_predprob))
plt.figure(figsize=(12,12))
feat_imp = pd.Series(alg.get_booster().get_fscore()).sort_values(ascending=False)
feat_imp.plot(kind='bar', title='Feature Importance', color='g')
plt.ylabel('Feature Importance Score')
plt.show()

调整 XGboost 的参数后,我有

xgb4 = XGBClassifier(
objective="binary:logistic",
learning_rate=0.10,
n_esimators=5000,
max_depth=6,
min_child_weight=1,
gamma=0.1,
subsample=0.8,
colsample_bytree=0.8,
reg_alpha=0.1,
nthread=4,
scale_pos_weight=1.0,
seed=27)
features = [x for x in X_train.columns if x not in ['Default Flag','ID']]
evaluate_model(xgb4, X_train, y_train, features)

我得到的结果是

Model Report
Accuracy : 0.803236
AUC Score (Train): 0.856995

我的问题可能是消息不灵通,这个 evaulate_model() 函数没有在我发现奇怪的数据测试集上进行测试。当我在测试集上调用它时 (evaluate_model(xgb4, X_test, y_test, features)) 我明白了

Model Report
Accuracy : 0.873706
AUC Score (Train): 0.965286

鉴于测试集比训练集具有更高的准确性,我想知道这两个模型报告是否有任何关系。如果这个问题的结构表述不当,我深表歉意。

最佳答案

我会进一步完善我的答案:

此函数在您提供的数据集上进行训练,并返回训练精度和 AUC:因此,这不是评估模型的可靠方法。

在您提供的链接中,据说此功能用于调整估算器的数量:

The function below performs the following actions to find the best number of boosting trees to use on your data:

  • Trains an XGBoost model using features of the data.
  • Performs k-fold cross validation on the model, using accuracy and AUC score as the evaluation metric.
  • Returns output for each boosting round so you can see how the model is learning. You will look at the detailed output in the next
    section.
  • It stops running after the cross-validation score does not improve significantly with additional boosting rounds, giving you an
    optimal number of estimators for the model.

您不应该使用它来评估您的模型性能,而应该执行干净的交叉验证。

在这种情况下,您的测试分数较高,因为您的测试集较小,因此模型更容易过拟合。

关于python - XGBoost算法,关于evaulate_model函数的问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54559567/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com