gpt4 book ai didi

python - SciKit-Learn 随机森林子样本大小如何等于原始训练数据大小?

转载 作者:太空狗 更新时间:2023-10-29 23:58:06 24 4
gpt4 key购买 nike

在 SciKit-Learn Random Forest classifier 的文档中,指出

The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).

我不明白的是,如果样本大小始终与输入样本大小相同,我们怎么能谈论随机选择。这里没有选择,因为我们在每次训练中使用所有(自然是相同的)样本。

我是不是漏掉了什么?

最佳答案

我相信this part文档回答您的问题

In random forests (see RandomForestClassifier and RandomForestRegressor classes), each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set. In addition, when splitting a node during the construction of the tree, the split that is chosen is no longer the best split among all features. Instead, the split that is picked is the best split among a random subset of the features. As a result of this randomness, the bias of the forest usually slightly increases (with respect to the bias of a single non-random tree) but, due to averaging, its variance also decreases, usually more than compensating for the increase in bias, hence yielding an overall better model.

理解的关键在于“抽取的样本替换”。这意味着每个实例可以被绘制多次。这反过来意味着,火车集中的某些实例出现了多次,而有些则根本不存在(包外)。这些对于不同的树是不同的

关于python - SciKit-Learn 随机森林子样本大小如何等于原始训练数据大小?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35827446/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com