gpt4 book ai didi

scikit-learn - 如何从训练有素的随机森林中找到关键树/特征?

转载 作者:行者123 更新时间:2023-12-03 18:12:00 26 4
gpt4 key购买 nike

我正在使用 Scikit-Learn 随机森林分类器并尝试提取有意义的树/特征以更好地理解预测结果。

我在文档( http://scikit-learn.org/dev/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.get_params )中发现了这种方法,但找不到如何使用它的示例。

如果可能的话,我也希望将这些树可视化,任何相关的代码都会很棒。

谢谢!

最佳答案

我认为您正在寻找 Forest.feature_importances_。这使您可以查看每个输入特征对最终模型的相对重要性。这是一个简单的例子。

import random
import numpy as np
from sklearn.ensemble import RandomForestClassifier


#Lets set up a training dataset. We'll make 100 entries, each with 19 features and
#each row classified as either 0 and 1. We'll control the first 3 features to artificially
#set the first 3 features of rows classified as "1" to a set value, so that we know these are the "important" features. If we do it right, the model should point out these three as important.
#The rest of the features will just be noise.
train_data = [] ##must be all floats.
for x in range(100):
line = []
if random.random()>0.5:
line.append(1.0)
#Let's add 3 features that we know indicate a row classified as "1".
line.append(.77)
line.append(.33)
line.append(.55)
for x in range(16):#fill in the rest with noise
line.append(random.random())
else:
#this is a "0" row, so fill it with noise.
line.append(0.0)
for x in range(19):
line.append(random.random())
train_data.append(line)
train_data = np.array(train_data)


# Create the random forest object which will include all the parameters
# for the fit. Make sure to set compute_importances=True
Forest = RandomForestClassifier(n_estimators = 100, compute_importances=True)

# Fit the training data to the training output and create the decision
# trees. This tells the model that the first column in our data is the classification,
# and the rest of the columns are the features.
Forest = Forest.fit(train_data[0::,1::],train_data[0::,0])

#now you can see the importance of each feature in Forest.feature_importances_
# these values will all add up to one. Let's call the "important" ones the ones that are above average.
important_features = []
for x,i in enumerate(Forest.feature_importances_):
if i>np.average(Forest.feature_importances_):
important_features.append(str(x))
print 'Most important features:',', '.join(important_features)
#we see that the model correctly detected that the first three features are the most important, just as we expected!

关于scikit-learn - 如何从训练有素的随机森林中找到关键树/特征?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17057139/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com