gpt4 book ai didi

python - 随机森林分类器的决策路径

转载 作者:行者123 更新时间:2023-11-30 08:34:43 24 4
gpt4 key购买 nike

这是我在您的环境中运行它的代码,我使用 RandomForestClassifier我试图找出 decision_path对于 RandomForestClassifier 中选定的样本.

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier

X, y = make_classification(n_samples=1000,
n_features=6,
n_informative=3,
n_classes=2,
random_state=0,
shuffle=False)

# Creating a dataFrame
df = pd.DataFrame({'Feature 1':X[:,0],
'Feature 2':X[:,1],
'Feature 3':X[:,2],
'Feature 4':X[:,3],
'Feature 5':X[:,4],
'Feature 6':X[:,5],
'Class':y})


y_train = df['Class']
X_train = df.drop('Class',axis = 1)

rf = RandomForestClassifier(n_estimators=50,
random_state=0)

rf.fit(X_train, y_train)

我得到的最远的是:

#Extracting the decision path for instance i = 12
i_data = X_train.iloc[12].values.reshape(1,-1)
d_path = rf.decision_path(i_data)

print(d_path)

但是输出没有多大意义:

(<1x7046 sparse matrix of type '<class 'numpy.int64'>'with 486 stored elements in Compressed Sparse Row format>, array([ 0, 133, 282, 415, 588, 761, 910, 1041, 1182, 1309, 1432,1569, 1728, 1869, 2000, 2143, 2284, 2419, 2572, 2711, 2856, 2987,3128, 3261, 3430, 3549, 3704, 3839, 3980, 4127, 4258, 4389, 4534,4671, 4808, 4947, 5088, 5247, 5378, 5517, 5640, 5769, 5956, 6079,6226, 6385, 6524, 6655, 6780, 6925, 7046], dtype=int32))

我正在尝试找出数据框中粒子样本的决策路径。谁能告诉我该怎么做?

这个想法是有类似 this 的东西.

最佳答案

RandomForestClassifier.decision_path 方法返回 (indicator, n_nodes_ptr)元组。请参阅文档: here

所以你的变量node_indicator是一个元组,而不是你想象的那样。元组对象没有属性“索引”,这就是您这样做时收到错误的原因:

node_index = node_indicator.indices[node_indicator.indptr[sample_id]:
node_indicator.indptr[sample_id + 1]]

尝试:

(node_indicator, _) = rf.decision_path(X_train)
<小时/>

您还可以为单个样本 id 绘制森林中每棵树的决策树:

X_train = X_train.values

sample_id = 0

for j, tree in enumerate(rf.estimators_):

n_nodes = tree.tree_.node_count
children_left = tree.tree_.children_left
children_right = tree.tree_.children_right
feature = tree.tree_.feature
threshold = tree.tree_.threshold

print("Decision path for DecisionTree {0}".format(j))
node_indicator = tree.decision_path(X_train)
leave_id = tree.apply(X_train)
node_index = node_indicator.indices[node_indicator.indptr[sample_id]:
node_indicator.indptr[sample_id + 1]]



print('Rules used to predict sample %s: ' % sample_id)
for node_id in node_index:
if leave_id[sample_id] != node_id:
continue

if (X_train[sample_id, feature[node_id]] <= threshold[node_id]):
threshold_sign = "<="
else:
threshold_sign = ">"

print("decision id node %s : (X_train[%s, %s] (= %s) %s %s)"
% (node_id,
sample_id,
feature[node_id],
X_train[sample_id, feature[node_id]],
threshold_sign,
threshold[node_id]))

请注意,在您的例子中,您有 50 个估算器,因此读起来可能有点无聊。

关于python - 随机森林分类器的决策路径,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48869343/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com