python - 为什么调用 fit 会重置 XGBClassifier 中的自定义目标函数？-6ren

python - 为什么调用 fit 会重置 XGBClassifier 中的自定义目标函数？

转载作者：行者123 更新时间：2023-12-03 16:21:48

我尝试设置 XGBoost sklearn API XGBClassifier根据文档使用自定义目标函数 (brier):

    .. note::  Custom objective function

        A custom objective function can be provided for the ``objective``
        parameter. In this case, it should have the signature
        ``objective(y_true, y_pred) -> grad, hess``:

        y_true: array_like of shape [n_samples]
            The target values
        y_pred: array_like of shape [n_samples]
            The predicted values

        grad: array_like of shape [n_samples]
            The value of the gradient for each sample point.
        hess: array_like of shape [n_samples]
            The value of the second derivative for each sample point

这是我的尝试:

import numpy as np
from xgboost import XGBClassifier
from sklearn.datasets import load_svmlight_file
train_data = load_svmlight_file('~/agaricus.txt.train')
X = train_data[0].toarray()
y = train_data[1]

def brier(y_true, y_pred):
    y_pred = 1.0 / (1.0 + np.exp(-y_pred))
    grad = 2 * y_pred * (y_true - y_pred) * (y_pred - 1)
    hess = 2 * y_pred ** (1 - y_pred) * (2 * y_pred * (y_true + 1) - y_true - 3 * y_pred ** 2)
    return grad, hess

m = XGBClassifier(objective=brier, seed=42)

它似乎导致正确的对象:

XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None, gamma=None,
              gpu_id=None, importance_type='gain', interaction_constraints=None,
              learning_rate=None, max_delta_step=None, max_depth=None,
              min_child_weight=None, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=None, num_parallel_tree=None,
              objective=<function brier at 0x7fe7ac418290>, random_state=None,
              reg_alpha=None, reg_lambda=None, scale_pos_weight=None, seed=42,
              subsample=None, tree_method=None, validate_parameters=False,
              verbosity=None)

但是，调用 .fit方法似乎重置 m反对默认设置:

m.fit(X, y)
m
XGBClassifier(base_score=0.5, booster=None, colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints=None,
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints=None,
              n_estimators=100, n_jobs=0, num_parallel_tree=1,
              objective='binary:logistic', random_state=42, reg_alpha=0,
              reg_lambda=1, scale_pos_weight=1, seed=42, subsample=1,
              tree_method=None, validate_parameters=False, verbosity=None)

与 objective='binary:logistic' .我注意到，在调查为什么直接针对 brier 进行优化时，我的 brier 分数会变得更差。比我使用默认 binary:logistic 时，如 here 所述.

那么，如何正确设置 XGBClassifier使用我的功能 brier作为自定义目标？

最佳答案

我相信您将目标误认为是目标函数(obj 作为参数)，xgboost 文档有时会很困惑。

简而言之，您只需要解决这个问题:

m = XGBClassifier(obj=brier, seed=42)

更深入一点，目标是 xgboost 如何在给定目标函数的情况下进行优化。通常 xgboost 从 y 向量中的类数推断优化。

我从 source code 中截取了一个片段，正如您所看到的，只要您只有两个类，目标就设置为 binary:logistic:

class XGBClassifier(XGBModel, XGBClassifierBase):
    def __init__(self, objective="binary:logistic", **kwargs):
        super().__init__(objective=objective, **kwargs)

    def fit(self, X, y, sample_weight=None, base_margin=None,
            eval_set=None, eval_metric=None,
            early_stopping_rounds=None, verbose=True, xgb_model=None,
            sample_weight_eval_set=None, callbacks=None):

        evals_result = {}
        self.classes_ = np.unique(y)
        self.n_classes_ = len(self.classes_)

        xgb_options = self.get_xgb_params() # <-- obj function is set here

        if callable(self.objective):
            obj = _objective_decorator(self.objective) # <----- here is the mismatch of the names, if you pass objective as your brie func it will become "binary:logistic"
            xgb_options["objective"] = "binary:logistic"
        else:
            obj = None

        if self.n_classes_ > 2:
            xgb_options['objective'] = 'multi:softprob' # <----- objective is being set here if n_classes> 2
            xgb_options['num_class'] = self.n_classes_

+-- 35 lines: feval = eval_metric if callable(eval_metric) else None-----------------------------------------------------------------------------------------------------------------------------------------------------

        self._Booster = train(xgb_options, train_dmatrix, # <----- objective is being passed in xgb_options dictionary
                              self.get_num_boosting_rounds(),
                              evals=evals,
                              early_stopping_rounds=early_stopping_rounds,
                              evals_result=evals_result, obj=obj, feval=feval, # <----- obj function is being passed to lower level api here
                              verbose_eval=verbose, xgb_model=xgb_model,
                              callbacks=callbacks)

+-- 12 lines: self.objective = xgb_options["objective"]------------------------------------------------------------------------------------------------------------------------------------------------------------------

        return self

有固定的 list of objectives您可以设置的目标列表:

目标 [默认=reg:squarederror]

reg:squarederror: regression with squared loss.

reg:squaredlogerror: regression with squared log loss 12[𝑙𝑜𝑔(𝑝𝑟𝑒𝑑+1)−𝑙𝑜𝑔(𝑙𝑎𝑏𝑒𝑙+1)]2. All input labels are required to be greater than -1. Also, see metric rmsle for possible issue with this objective.

reg:logistic: logistic regression

binary:logistic: logistic regression for binary classification, output probability

binary:logitraw: logistic regression for binary classification, output score before logistic transformation

binary:hinge: hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.

count:poisson –poisson regression for count data, output mean of poisson distribution

max_delta_step is set to 0.7 by default in poisson regression (used to safeguard optimization)

survival:cox: Cox regression for right censored survival time data (negative values are considered right censored). Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional hazard function h(t) = h0(t) * HR).

multi:softmax: set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes)

multi:softprob: same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata * nclass matrix. The result contains predicted probability of each data point belonging to each class.

rank:pairwise: Use LambdaMART to perform pairwise ranking where the pairwise loss is minimized

rank:ndcg: Use LambdaMART to perform list-wise ranking where Normalized Discounted Cumulative Gain (NDCG) is maximized

rank:map: Use LambdaMART to perform list-wise ranking where Mean Average Precision (MAP) is maximized

reg:gamma: gamma regression with log-link. Output is a mean of gamma distribution. It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be gamma-distributed.

reg:tweedie: Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be Tweedie-distributed.

只是确认目标不能是您的布里函数，在调用较低级别的api之前手动将目标设置为源代码中的布里函数

class XGBClassifier(XGBModel, XGBClassifierBase):
    def __init__(self, objective="binary:logistic", **kwargs):
        super().__init__(objective=objective, **kwargs)

    def fit(self, X, y, sample_weight=None, base_margin=None,
            eval_set=None, eval_metric=None,
            early_stopping_rounds=None, verbose=True, xgb_model=None,
            sample_weight_eval_set=None, callbacks=None):

+-- 54 lines: evals_result = {}--------------------------------------------------------------------
        xgb_options["objective"] = xgb_options["obj"]
        self._Booster = train(xgb_options, train_dmatrix,
                              self.get_num_boosting_rounds(),
                              evals=evals,
                              early_stopping_rounds=early_stopping_rounds,
                              evals_result=evals_result, obj=obj, feval=feval,
                              verbose_eval=verbose, xgb_model=xgb_model,
                              callbacks=callbacks)

+-- 14 lines: self.objective = xgb_options["objective"]--------------------------------------------

引发此错误:

    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [10:09:53] /private/var/folders/z5/mchb9bz51cx3h97nkw9v0wkr0000gn/T/pip-install-kh801rm0/xgboost/xgboost/src/objective/objective.cc:26: Unknown objective function: `<function brier at 0x10b630d08>`
Objective candidate: binary:hinge
Objective candidate: multi:softmax
Objective candidate: multi:softprob
Objective candidate: rank:pairwise
Objective candidate: rank:ndcg
Objective candidate: rank:map
Objective candidate: reg:squarederror
Objective candidate: reg:squaredlogerror
Objective candidate: reg:logistic
Objective candidate: binary:logistic
Objective candidate: binary:logitraw
Objective candidate: reg:linear
Objective candidate: count:poisson
Objective candidate: survival:cox
Objective candidate: reg:gamma
Objective candidate: reg:tweedie

关于python - 为什么调用 fit 会重置 XGBClassifier 中的自定义目标函数？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61067351/

文章推荐： mongodb - $Push 如果对象不存在

文章推荐： c# - 获取 C# 中的分配总数

文章推荐： Angular Nebular 样式不适用于 NbChatComponent

unit-testing - CMake:目标 "test"不构建测试，目标 "all"构建测试
我创建了一个基于命令行可移植脚本的工业化不可知构建系统，可用于快速构建多个依赖项目，而不必依赖特定的 IDE 或构建工厂。它是不可知的，因为它不是基于单个构建引擎。我使用 cmake 创建了第一个版本
java - antlr4 语法适用于 Java 目标，但适用于 Python 目标
我最初使用 Java 目标开发了一个语法(用于 TestRig 支持)，然后将其移植到 Python(从 git hub 语法存储库扩展了 Python3 语法，因此需要将操作移植到 Python
ios - 仅在 xcode 中启动 iPhone 目标，而不启动 OSwatch 目标，当两者都存在于项目中时
我有一个以 iPhone 和 watchOS 为目标的 Xcode 项目。 iPhone 目标使用加速度计，模拟器不支持。我可以只启动 iPhone 应用程序而不启动 watch 目标吗？我从: Ca
windows - 删除 .eml 文件中的行并将新的 "files"(目标 A)复制到多个文件(目标 B)中
您好，我想创建一个批处理文件，用于在 .eml 文件(目标 A)中查找某些关键字，然后删除它们所在的行。之后，我需要批处理文件将"new"文件放入(目标 B)中的单独 .eml 文件中。文件也可以是
android - 无法将使用 JVM 目标 1.8 构建的字节码内联到使用 JVM 目标 1.6 构建的字节码
当尝试通过 IntelliJ 运行示例 CorDapp (GitHub CorDapp) 时，我收到以下错误: Cannot inline bytecode built with JVM target
gradle - Kotlin:无法将使用 JVM 目标 1.8 构建的字节码内联到使用 JVM 目标 1.6 的字节码中
我在尝试向我的 kotlin spring 项目添加一些依赖项时遇到问题。我使用 spring boot 初始化程序来运行一个基本项目。我的问题:如果我取消对 jackson 或 Koin 依赖项的
JavaScript 目标。
这是有问题的网站: http://www.onepixelroom.com/londonrefurb 当我点击关于部分后面的多个圆圈时，我希望它更改上面文本中的引号。到目前为止，我得到它来显示文本
jQuery:目标 $(this) 和一个元素
单击后，我将删除两个元素 $(this) 和 $("#foo")。目前我的代码如下所示: $(this).remove(); $("#foo").remove(); 如何在不重复自己的情况下优化它？
具有多个依赖项的 Makefile 目标
我有一个小脚本，可将 Markdown 文件编译为 html，并将其与一些样式表和 javascript 一起插入到模板的主体中。我有一个 GNU makefile 来完成这个: output.htm
C 目标 if 语句有问题
已关闭。此问题需要 debugging details 。目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and the
每个客户端的 IOS 目标
一些背景知识: 在android中我们开发了同样的应用，基本上我们先开发了Android应用，现在我们创建了它的IOS版本，所以这个应用有多个客户端。在 android 中，我们实际上是使用 Andr
javascript - knockout 目标
我想知道是否可以使用 knockout 来更改html中的目标() 我的所有其他信息都在 JavaScript 中，所以这对我来说是一个大问题。这是我的 JavaScript: var library
jquery - 目标::在jquery之前
这个问题在这里已经有了答案: Selecting and manipulating CSS pseudo-elements such as ::before and ::after using j
c++ - 如何遍历有向图中的所有边并获取源+目标
我在我的有向图中添加了一堆节点和顶点，使用设置 typedef boost::adjacency_list graph; 创建 Node有一个节点名称字符串，Edge它的分数有一个整数。我试图遍历所有
Android 目标 API
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。这个问题似乎与 help center 中定义的范围内的编程无关。 . 关闭 8 年前。 Improve
ios - 如何存储自定义对象数组(目标)
如何存储我在 NSUserDefaults 中创建的 Goal 类型的对象数组？ ( swift ) 代码如下: func saveGoalList ( newGoalList : [Goal] ){
TypeScript 目标 ES3
Array.prototype.indexOf 和 Date.now 已在 ES5 中引入。如果我编译存储在文件 test.ts 中的以下代码，为什么 Typescript 不能转译？ Date.no
C# 属性和属性位置/目标
我正在阅读有关属性的内容，并了解到可以使用您的代码将它们应用于不同的目标实体 -(请参阅 Attribute Targets)。因此，查看我项目中的 AssemblyInfo.cs 文件，我可以看到
makefile - 如何执行所有匹配通配符的 makefile 目标
给定一个 Makefile: all: build/a build/b build/c # need to change this to all: build/* build/a:
build - 仅针对一个框架的 MSBuild 目标
我有一个带有多框架目标的项目- netstandard2.0;net471 . 我想为 netframework 构建解决方案和 netstandard分别。目前我使用这个 MSBuild 命令:

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 为什么调用 fit 会重置 XGBClassifier 中的自定义目标函数？