gpt4 book ai didi

scikit-learn RandomForestClassifier 概率预测与多数投票

转载 作者:行者123 更新时间:2023-12-01 02:13:51 25 4
gpt4 key购买 nike

section 1.9.2.1 中的 scikit-learn 文档中(摘录如下),为什么随机森林的实现与 Breiman 的原始论文不同?据我所知,在聚合分类器的集合时,Breiman 选择了多数票(模式)进行分类和平均回归(由 Liaw 和 Wiener 撰写的论文,原始 R 代码的维护者,引用如下)。

  • 为什么 scikit-learn 使用概率预测而不是多数投票?
  • 使用概率预测有什么优势吗?

  • 有问题的部分:

    In contrast to the original publication [B2001], the scikit-learn implementation combines classifiers by averaging their probabilistic prediction, instead of letting each classifier vote for a single class.



    资料来源:Liaw, A., & Wiener, M. (2002)。 randomForest 的分类和回归。 R 新闻,2(3),18-22。

    最佳答案

    这个问题现在已经answered on Cross Validated .包含在此处以供引用:

    Such questions are always best answered by looking at the code, if you're fluent in Python.

    RandomForestClassifier.predict, at least in the current version 0.16.1, predicts the class with highest probability estimate, as given by predict_proba. (this line)

    The documentation for predict_proba says:

    The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.



    与原始方法的区别可能只是这样
    predict给出与 predict_proba 一致的预测.这
    结果有时被称为“软投票”,而不是“硬投票”
    原始 Breiman 论文中使用的多数票。我不能很快
    搜索找到合适的比较两者的性能
    方法,但在这种情况下它们似乎都相当合理。

    predict文档充其量具有误导性;我有
    已提交 a pull request
    修理它。

    如果你想做多数投票预测,这里有一个函数
    去做吧。像这样称呼它 predict_majvote(clf, X)而不是
    clf.predict(X) . (基于 predict_proba ;仅经过轻微测试,但
    我认为它应该起作用。)
    from scipy.stats import mode
    from sklearn.ensemble.forest import _partition_estimators, _parallel_helper
    from sklearn.tree._tree import DTYPE
    from sklearn.externals.joblib import Parallel, delayed
    from sklearn.utils import check_array
    from sklearn.utils.validation import check_is_fitted

    def predict_majvote(forest, X):
    """Predict class for X.

    Uses majority voting, rather than the soft voting scheme
    used by RandomForestClassifier.predict.

    Parameters
    ----------
    X : array-like or sparse matrix of shape = [n_samples, n_features]
    The input samples. Internally, it will be converted to
    ``dtype=np.float32`` and if a sparse matrix is provided
    to a sparse ``csr_matrix``.
    Returns
    -------
    y : array of shape = [n_samples] or [n_samples, n_outputs]
    The predicted classes.
    """
    check_is_fitted(forest, 'n_outputs_')

    # Check data
    X = check_array(X, dtype=DTYPE, accept_sparse="csr")

    # Assign chunk of trees to jobs
    n_jobs, n_trees, starts = _partition_estimators(forest.n_estimators,
    forest.n_jobs)

    # Parallel loop
    all_preds = Parallel(n_jobs=n_jobs, verbose=forest.verbose,
    backend="threading")(
    delayed(_parallel_helper)(e, 'predict', X, check_input=False)
    for e in forest.estimators_)

    # Reduce
    modes, counts = mode(all_preds, axis=0)

    if forest.n_outputs_ == 1:
    return forest.classes_.take(modes[0], axis=0)
    else:
    n_samples = all_preds[0].shape[0]
    preds = np.zeros((n_samples, forest.n_outputs_),
    dtype=forest.classes_.dtype)
    for k in range(forest.n_outputs_):
    preds[:, k] = forest.classes_[k].take(modes[:, k], axis=0)
    return preds

    在我尝试过的愚蠢的合成案例中,预测与
    predict方法每次。

    关于scikit-learn RandomForestClassifier 概率预测与多数投票,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26899274/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com