gpt4 book ai didi

python - 使用了 SequentialFeatureSelector 但没有提高模型精度

转载 作者:太空宇宙 更新时间:2023-11-03 21:19:26 26 4
gpt4 key购买 nike

我正在选择用于构建流失预测模型的特征。使用 RandomForestClassifier,我获得了 0.9517 的准确度,并且显示了其中选择的 16 个特征。

但是,如果我分别使用 RandomForestClassifier 使用相同的 16 个特征列表拟合模型,则显示的准确度分数为 0.8714,为什么尽管我使用 SequentialFeatureSelector 选择的相同特征列表,但准确度分数却存在巨大差异?

[2019-01-28 17:51:16] Features: 16/16 -- score: 0.9517879681082387[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 3.6s remaining: 0.0s

rand_forest = RandomForestClassifier(bootstrap=True, 
class_weight=None, criterion='gini',
max_depth=None, max_features='auto',
max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)

SequentialFeatureSelector(clone_estimator=True, cv=0,
estimator=rand_forest,
floating=False, forward=True, k_features=16, n_jobs=1,
pre_dispatch='2*n_jobs', scoring='accuracy', verbose=2)

xtr, xtst, ytr, ytst = train_test_split(x, y, random_state=5, test_size=0.2)

rfst = RandomForestClassifier(n_estimators=100)

rfstmodel = rfst.fit(xtr, ytr)

rfstmodel.score(xtst, ytst)

>>> 0.8714975845410629

最佳答案

随机森林分类器不仅随机化特征,它们还随机化这些特征的分割,因此即使你的特征保持不变,你的特征分割每次都是随机生成的,这可能会引入一些方差该模型。对于平均方差较小的更正则化的模型,我推荐 Gradient Boosted Model ,或者更好 XGBoost .

Random Forest adds additional randomness to the model, while growing the trees. Instead of searching for the most important feature while splitting a node, it searches for the best feature among a random subset of features. This results in a wide diversity that generally results in a better model.

Therefore, in Random Forest, only a random subset of the features is taken into consideration by the algorithm for splitting a node. You can even make trees more random, by additionally using random thresholds for each feature rather than searching for the best possible thresholds (like a normal decision tree does).

来源:https://towardsdatascience.com/the-random-forest-algorithm-d457d499ffcd

关于python - 使用了 SequentialFeatureSelector 但没有提高模型精度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54423846/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com