gpt4 book ai didi

python - LightGBM中的predict_proba()函数如何在内部工作?

转载 作者:行者123 更新时间:2023-12-03 16:55:40 24 4
gpt4 key购买 nike

这是在内部了解如何使用 LightGBM 预测类的概率的引用。
其他软件包,例如sklearn,为其分类器提供了详尽的详细信息。例如:

  • LogisticRegression 返回:

  • Probability estimates.

    The returned estimates for all classes are ordered by the label ofclasses.

    For a multi_class problem, if multi_class is set to be “multinomial”the softmax function is used to find the predicted probability of eachclass. Else use a one-vs-rest approach, i.e calculate the probabilityof each class assuming it to be positive using the logistic function.and normalize these values across all the classes.


  • RandomForest 返回:

  • Predict class probabilities for X.

    The predicted class probabilities of an input sample are computed asthe mean predicted class probabilities of the trees in the forest. Theclass probability of a single tree is the fraction of samples of thesame class in a leaf.


    还有其他堆栈溢出问题,这些问题提供了更多详细信息,例如:
  • Support Vector Machines
  • Multilayer Perceptron

  • 我正在尝试发现LightGBM的 predict_proba函数的相同细节。 The documentation没有列出概率计算的详细信息。
    该文档仅声明:

    Return the predicted probability for each class for each sample.


    源代码如下:
    def predict_proba(self, X, raw_score=False, start_iteration=0, num_iteration=None,
    pred_leaf=False, pred_contrib=False, **kwargs):
    """Return the predicted probability for each class for each sample.

    Parameters
    ----------
    X : array-like or sparse matrix of shape = [n_samples, n_features]
    Input features matrix.
    raw_score : bool, optional (default=False)
    Whether to predict raw scores.
    start_iteration : int, optional (default=0)
    Start index of the iteration to predict.
    If <= 0, starts from the first iteration.
    num_iteration : int or None, optional (default=None)
    Total number of iterations used in the prediction.
    If None, if the best iteration exists and start_iteration <= 0, the best iteration is used;
    otherwise, all iterations from ``start_iteration`` are used (no limits).
    If <= 0, all iterations from ``start_iteration`` are used (no limits).
    pred_leaf : bool, optional (default=False)
    Whether to predict leaf index.
    pred_contrib : bool, optional (default=False)
    Whether to predict feature contributions.

    .. note::

    If you want to get more explanations for your model's predictions using SHAP values,
    like SHAP interaction values,
    you can install the shap package (https://github.com/slundberg/shap).
    Note that unlike the shap package, with ``pred_contrib`` we return a matrix with an extra
    column, where the last column is the expected value.

    **kwargs
    Other parameters for the prediction.

    Returns
    -------
    predicted_probability : array-like of shape = [n_samples, n_classes]
    The predicted probability for each class for each sample.
    X_leaves : array-like of shape = [n_samples, n_trees * n_classes]
    If ``pred_leaf=True``, the predicted leaf of every tree for each sample.
    X_SHAP_values : array-like of shape = [n_samples, (n_features + 1) * n_classes] or list with n_classes length of such objects
    If ``pred_contrib=True``, the feature contributions for each sample.
    """
    result = super(LGBMClassifier, self).predict(X, raw_score, start_iteration, num_iteration,
    pred_leaf, pred_contrib, **kwargs)
    if callable(self._objective) and not (raw_score or pred_leaf or pred_contrib):
    warnings.warn("Cannot compute class probabilities or labels "
    "due to the usage of customized objective function.\n"
    "Returning raw scores instead.")
    return result
    elif self._n_classes > 2 or raw_score or pred_leaf or pred_contrib:
    return result
    else:
    return np.vstack((1. - result, result)).transpose()
    我如何理解 predict_probaLightGBM函数在内部如何正常工作?

    最佳答案

    与所有用于分类的梯度增强方法一样,LightGBM本质上结合了决策树和逻辑回归。我们从表示概率(也称为softmax)的相同逻辑函数开始:P(y = 1 | X) = 1/(1 + exp(Xw))有趣的是,特征矩阵X由决策树集合中的终端节点组成。然后,所有这些都由w(必须学习的参数)加权。用于学习权重的机制取决于所使用的精确学习算法。同样,X的构造也取决于算法。例如,LightGBM引入了两项新颖的功能,它们比XGBoost的性能有所提高:"Gradient-based One-Side Sampling" and "Exclusive Feature Bundling"。通常,每一行都会收集每个样本的末梢叶子,而各列则代表末梢叶子。
    这就是文档可能会说的...

    Probability estimates.

    The predicted class probabilities of an input sample are computed as thesoftmax of the weighted terminal leaves from the decision tree ensemble corresponding to the provided sample.


    有关更多详细信息,您必须深入研究boosting,XGBoost,最后是LightGBM论文的细节,但是鉴于您提供的其他文档示例,这似乎有些繁琐。

    关于python - LightGBM中的predict_proba()函数如何在内部工作?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63490533/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com