gpt4 book ai didi

python - Scikit Learn 中不确定的随机森林文档

转载 作者:太空狗 更新时间:2023-10-30 01:21:49 24 4
gpt4 key购买 nike

在 Scikit-Learn 的集成方法文档中 http://scikit-learn.org/stable/modules/ensemble.html#id61.9.2.3 节中。我们读到的参数:

(...) The best results are also usually reached when setting max_depth=None in combination with min_samples_split=1 (i.e., when fully developing the trees). Bear in mind though that these values are usually not optimal. The best parameter values should always be cross- validated.

那么最佳结果和最佳结果有什么区别呢?我认为作者所说的最佳结果是指最佳交叉验证预测结果。

In addition, note that bootstrap samples are used by default in random forests (bootstrap=True) while the default strategy is to use the original dataset for building extra-trees (bootstrap=False).

我是这样理解的:在 Scikit-Learns 实现中默认使用自举,但默认策略是使用自举。如果是这样,那么默认策略的来源是什么?为什么它不是实现中的默认策略?

最佳答案

我同意第一句话是自相矛盾的。也许以下会更好:

The best results are also often reached with fully developed trees (max_depth=None and min_samples_split=1). Bear in mind though that these values are usually not guaranteed to be optimal. The best parameter values should always be cross-validated.

对于第二个引用,它将随机森林(RandomForestClassifierRandomForestRegression)的 bootstrap 参数的默认值与极端随机树进行比较在类 ExtraTreesClassifierExtraTreesRegressor 中实现。以下内容可能更明确:

In addition, note that bootstrap samples are used by default in random forests (bootstrap=True) while for building extra-trees the default strategy is to use the original dataset (bootstrap=False).

如果您发现这些公式更易于理解,请随时提交包含修复的 PR。

关于python - Scikit Learn 中不确定的随机森林文档,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28411976/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com