gpt4 book ai didi

python - RandomForest,如何选择最优的n_estimator参数

转载 作者:行者123 更新时间:2023-11-30 09:26:51 25 4
gpt4 key购买 nike

我想训练我的模型并选择最佳的树数量。代码在这里

from sklearn.ensemble import RandomForestClassifier

tree_dep = [3,5,6]
tree_n = [2,5,7]

avg_rf_f1 = []
search = []

for x in tree_dep:
for y in tree_n:
search.append((a,b))
rf_model = RandomForestClassifier(n_estimators=tree_n, max_depth=tree_dep, random_state=42)
rf_scores = cross_val_score(rf_model, X_train, y_train, cv=10, scoring='f1_macro')

avg_rf_f1.append(np.mean(rf_scores))

best_tree_dep, best_n = search[np.argmax(avg_rf_f1)]

错误在这一行

rf_scores = cross_val_score(rf_model, X_train, y_train, cv=10, scoring='f1_macro')

ValueError: n_estimators must be an integer, got <class 'list'>.

想知道如何解决它。谢谢!!!

最佳答案

scikit-learn 中有一个辅助函数,名为 GridSearchCV 就是这么做的。它采用您想要测试的参数值列表,并使用所有可能的参数集训练分类器以返回最佳参数集。
我建议它比您正在实现的嵌套循环方法更干净、更快。它可以轻松扩展到其他参数(只需将所需的参数添加到网格中)并且可以并行化。

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

params_to_test = {
'n_estimators':[2,5,7],
'max_depth':[3,5,6]
}

#here you can put any parameter you want at every run, like random_state or verbosity
rf_model = RandomForestClassifier(random_state=42)
#here you specify the CV parameters, number of folds, numberof cores to use...
grid_search = GridSearchCV(rf_model, param_grid=params_to_test, cv=10, scoring='f1_macro', n_jobs=4)

grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_

#best_params is a dict you can pass directly to train a model with optimal settings
best_model = RandomForestClassifier(**best_params)

正如评论中所指出的,最好的模型存储在 grid_search 对象中,因此不要使用以下命令创建新模型:

best_model = RandomForestClassifier(**best_params)

我们可以使用grid_search中的那个:

best_model = grid_search.best_estimator_

关于python - RandomForest,如何选择最优的n_estimator参数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52513495/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com