python - 扩展 xgboost.XGBClassifier-6ren

python - 扩展 xgboost.XGBClassifier

转载作者：太空宇宙更新时间：2023-11-03 14:49:43

24

4

我正在尝试定义一个名为 XGBExtended 的类扩展类xgboost.XGBClassifier ，xgboost 的 scikit-learn API。我遇到了 get_params 的一些问题方法。下面是说明该问题的 IPython session 。基本上，get_params似乎只返回我在 XGBExtended.__init__ 中定义的属性，并且在父 init 方法 ( xgboost.XGBClassifier.__init__ ) 期间定义的属性将被忽略。我正在使用 IPython 并运行 python 2.7。完整的系统规范位于底部。

In [182]: import xgboost as xgb
     ...: 
     ...: class XGBExtended(xgb.XGBClassifier):
     ...:   def __init__(self, foo):
     ...:     super(XGBExtended, self).__init__()
     ...:     self.foo = foo
     ...: 
     ...: clf = XGBExtended(foo = 1)
     ...: 
     ...: clf.get_params()
     ...: 
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-182-431c4c3f334b> in <module>()
      8 clf = XGBExtended(foo = 1)
      9 
---> 10 clf.get_params()

/Users/andrewhannigan/lib/xgboost/python-package/xgboost/sklearn.pyc in get_params(self, deep)
    188         if isinstance(self.kwargs, dict):  # if kwargs is a dict, update params accordingly
    189             params.update(self.kwargs)
--> 190         if params['missing'] is np.nan:
    191             params['missing'] = None  # sklearn doesn't handle nan. see #4725
    192         if not params.get('eval_metric', True):

KeyError: 'missing'

所以我遇到了一个错误，因为“缺失”不是 params 中的键XGBClassifier.get_params 内的字典方法。我进入调试器来查看:

In [183]: %debug
> /Users/andrewhannigan/lib/xgboost/python-package/xgboost/sklearn.py(190)get_params()
    188         if isinstance(self.kwargs, dict):  # if kwargs is a dict, update params accordingly
    189             params.update(self.kwargs)
--> 190         if params['missing'] is np.nan:
    191             params['missing'] = None  # sklearn doesn't handle nan. see #4725
    192         if not params.get('eval_metric', True):

ipdb> params
{'foo': 1}
ipdb> self.__dict__
{'n_jobs': 1, 'seed': None, 'silent': True, 'missing': nan, 'nthread': None, 'min_child_weight': 1, 'random_state': 0, 'kwargs': {}, 'objective': 'binary:logistic', 'foo': 1, 'max_depth': 3, 'reg_alpha': 0, 'colsample_bylevel': 1, 'scale_pos_weight': 1, '_Booster': None, 'learning_rate': 0.1, 'max_delta_step': 0, 'base_score': 0.5, 'n_estimators': 100, 'booster': 'gbtree', 'colsample_bytree': 1, 'subsample': 1, 'reg_lambda': 1, 'gamma': 0}
ipdb>

如您所见，params仅包含 foo多变的。但是，该对象本身包含 xgboost.XGBClassifier.__init__ 定义的所有参数。。但由于某种原因BaseEstimator.get_params从 xgboost.XGBClassifier.get_params 调用的方法仅获取 XGBExtended.__init__ 中明确定义的参数方法。不幸的是，即使我明确调用 get_params与 deep = True ，它仍然无法正常工作:

ipdb> super(XGBModel, self).get_params(deep=True)
{'foo': 1}
ipdb>

谁能告诉我为什么会发生这种情况吗？

系统规范:

In [186]: print IPython.sys_info()
{'commit_hash': u'1149d1700',
 'commit_source': 'installation',
 'default_encoding': 'UTF-8',
 'ipython_path': '/Users/andrewhannigan/virtualenvironment/nimble_ai/lib/python2.7/site-packages/IPython',
 'ipython_version': '5.4.1',
 'os_name': 'posix',
 'platform': 'Darwin-14.5.0-x86_64-i386-64bit',
 'sys_executable': '/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python',
 'sys_platform': 'darwin',
 'sys_version': '2.7.10 (default, Jul  3 2015, 12:05:53) \n[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)]'}

最佳答案

这里的问题是子类的声明不正确。当您仅使用 foo 声明 init 方法时，您将覆盖原始方法。即使基类构造函数应该具有默认值，它也不会自动初始化。

您应该使用以下内容:

class XGBExtended(xgb.XGBClassifier):
    def __init__(self, foo, max_depth=3, learning_rate=0.1,
                 n_estimators=100, silent=True,
                 objective="binary:logistic",
                 nthread=-1, gamma=0, min_child_weight=1,
                 max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1,
                 reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
                 base_score=0.5, seed=0, missing=None, **kwargs):

        # Pass the required parameters to super class
        super(XGBExtended, self).__init__(max_depth, learning_rate,
                                            n_estimators, silent, objective,
                                            nthread, gamma, min_child_weight,
                                            max_delta_step, subsample,
                                            colsample_bytree, colsample_bylevel,
                                            reg_alpha, reg_lambda,
scale_pos_weight, base_score, seed, missing, **kwargs)

        # Use other custom parameters
        self.foo = foo

之后您将不会收到任何错误。

clf = XGBExtended(foo = 1)
print(clf.get_params(deep=True))

>>> {'reg_alpha': 0, 'colsample_bytree': 1, 'silent': True, 
     'colsample_bylevel': 1, 'scale_pos_weight': 1, 'learning_rate': 0.1, 
     'missing': None, 'max_delta_step': 0, 'nthread': -1, 'base_score': 0.5, 
     'n_estimators': 100, 'subsample': 1, 'reg_lambda': 1, 'seed': 0, 
     'min_child_weight': 1, 'objective': 'binary:logistic', 
     'foo': 1, 'max_depth': 3, 'gamma': 0}

关于python - 扩展 xgboost.XGBClassifier，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45950630/

24

4

0

文章推荐： python - 使用正则表达式从文件中获取测试详细信息

文章推荐： python - 在 python 中编写 csv 中特定列的列表列表

文章推荐： python - 在 Python 中使用字典进行财务跟踪

文章推荐： python - 打开第二个 python 文件 (tkinter)

xgboost - xgboost 模型的内部节点预测
是否可以计算 xgboost 模型的内部节点预测？ R 包 gbm 提供了对每棵树的内部节点的预测。然而，xgboost 输出仅显示对模型最后一片叶子的预测。 xgboost 输出: 请注意，质量列
xgboost - XGBoost 中多类分类的损失函数是什么？
我想知道哪个损失函数使用 XGBoost 进行多类分类。我找到了 in this question二元情况下逻辑分类的损失函数。我认为对于多类情况，它可能与 GBM 中的相同(对于 K 类)whic
xgboost - XGBoost 如何进行并行计算？
XGBoost 使用加法训练的方法，在该方法中对先前模型的残差进行建模。虽然这是顺序的，那么它如何并行计算呢？最佳答案 Xgboost 不会像您提到的那样并行运行多棵树，您需要在每棵树之后进行预测
xgboost - 在这个 XGBoost 树中如何计算休假分数？
我正在看下面的图片。有人可以解释一下它们是如何计算的吗？我以为 N 是 -1，是 +1，但后来我不明白这个小女孩怎么有 0.1。但这对于树 2 也不起作用。最佳答案我同意@user1808924
xgboost - Sagemaker 中 XGBoost 的功能重要性
我已经使用 Amazon Sagemaker 构建了一个 XGBoost 模型，但是我找不到任何可以帮助我解释模型并验证它是否学习了正确的依赖关系的东西。通常，我们可以通过 python API (
r - 使用 xgboost 函数时出现 XGBoost 错误
这是我的代码: xgb <- xgboost(data = as.matrix(df_all_combined), label = as.matrix(target_tr
xgboost - 梯度提升过程 (xgboost) 中如何使用参数 "weight"(DMatrix)？
在 xgboost 中可以设置参数 weight对于 DMatrix .这显然是一个权重列表，其中每个值都是相应样本的权重。我找不到有关这些权重如何在梯度提升过程中实际使用的任何信息。他们是否与 e
xgboost - 如何在 jupyter 中隐藏来自 xgboost 库的警告？
不工作: import warnings warnings.filterwarnings('ignore') 我得到的警告: [14:24:45] WARNING: C:/Jenkins/worksp
python - 如何在没有 XGBoost 库的情况下生成 XGBoost 输出？
我有一个用 Python 训练的 XGBoost 二元分类器模型。我想在不同的脚本环境 (MQL4) 中使用纯数学运算而不使用 XGBoost 库 (.predict) 从该模型生成新输入数据的输出
xgboost - 将 Azure AutoML 与 XGBoost 分类器一起用于分类数据时出现奇怪的算法选择
我有一个仅包含分类特征和分类标签的数据模型。因此，当我在 XGBoost 中手动构建该模型时，我基本上会将特征转换为二进制列(使用 LabelEncoder 和 OneHotEncoder)，并使用
xgboost - 使用 'rank:pairwise' 的 XGboost 的输出是什么？
我使用 XGBoost 的 python 实现。目标之一是rank:pairwise并且最小化成对损失( Documentation )。但是，它没有说明输出的范围。我看到 -10 到 10 之间的数
xgboost - hyperopt 结果超出了我的 hp.choice 限制，为什么？ (XGBoost)
我遇到了一个奇怪的问题: 我通过 hyperopt 定义了我的 XGB 超参数 'max_depth' hp.choice('max_depth',range(2,20)) 但我得到了 'max_de
r - “xgboost” 官方包与 R 中 "caret"包的 xgboost 的不同结果
我是 R 编程语言新手，我需要运行“xgboost”进行一些实验。问题是我需要交叉验证模型并获得准确性，我发现两种方法可以给我不同的结果: 使用“插入符号”: library(mlbench) lib
xgboost - 对于 XGBoost 二进制分类问题，选择 auc/error/logloss 作为 eval_metric 有什么影响？
选择 auc、error 或 logloss 作为 XGBoost 的 eval_metric 对其性能有何影响？假设数据不平衡。它如何影响准确度、召回率和精确度？最佳答案在不同的评估矩阵之间进
python - 如何使用 XGBoost 获取 Predictions 和使用 Scikit-Learn Wrapper 的 XGBoost 进行匹配？
我是 Python 中 XGBoost 的新手，所以如果这里的答案很明显，我深表歉意，但我正在尝试使用 panda 数据框并在 Python 中获取 XGBoost 来给我使用 Scikit-Lear
xgboost - 如何在xgboost的多类分类中为不平衡数据设置权重？
我知道您可以为不平衡的数据集设置 scale_pos_weight。然而，如何处理不平衡数据集中的多分类问题。我经历过https://datascience.stackexchange.com/que
python - xgboost 预测对概率的贡献
我正在使用 xgboost 的功能 pred_contribs 以便为我的模型的每个样本获得某种可解释性(shapley 值)。 booster.predict(test, pred_contribs
Xgboost cox 生存时间输入
在 xgboost 0.81 中 cox ph 生存模型的新实现中，如何指定事件的开始和结束时间？谢谢例如，R 等效函数是: cph_mod = coxph(Surv(Start, Stop, S
r - xgboost，抵消曝光？
我正在 R 中建模 claim 频率(泊松分布)。我正在使用 gbm和 xgboost包，但似乎xgboost没有将曝光考虑在内的偏移参数？在 gbm ，人们会按如下方式考虑暴露: gbm.fit(
r - xgboost 包和随机森林回归
xgboost 包允许构建一个随机森林(实际上，它选择列的随机子集来为整棵树的 split 选择一个变量，而不是为了点头，因为它是算法的经典版本，但它可以忍受)。但是对于回归，似乎只使用了森林中的一棵

首页

博学

6Ren·AI

商城

python - 扩展 xgboost.XGBClassifier