python - 打印随机森林分类器中特定样本的决策路径-6ren

python - 打印随机森林分类器中特定样本的决策路径

转载作者：太空狗更新时间：2023-10-30 00:58:46

如何为特定样本打印随机森林的决策路径，而不是随机森林中单个树的路径。

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier

X, y = make_classification(n_samples=1000,
                           n_features=6,
                           n_informative=3,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)

# Creating a dataFrame
df = pd.DataFrame({'Feature 1':X[:,0],
                                  'Feature 2':X[:,1],
                                  'Feature 3':X[:,2],
                                  'Feature 4':X[:,3],
                                  'Feature 5':X[:,4],
                                  'Feature 6':X[:,5],
                                  'Class':y})


y_train = df['Class']
X_train = df.drop('Class',axis = 1)

rf = RandomForestClassifier(n_estimators=10,
                               random_state=0)

rf.fit(X_train, y_train)

随机森林的决策路径是在 v0.18 中引入的。 ( http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html )

但是，它输出一个我不确定如何理解的稀疏矩阵。谁能建议如何最好地打印该特定样本的决策路径，然后将其可视化？

#Extracting the decision path for instance i = 12
i_data = X_train.iloc[12].values.reshape(1,-1)
d_path = rf.decision_path(i_data)

print(d_path)

输出:

(<1x1432 sparse matrix of type '' with 96 stored elements in Compressed Sparse Row format>, array([ 0, 133, >282, 415, 588, 761, 910, 1041, 1182, 1309, 1432], dtype=int32))

最佳答案

我找到了这个 code在 scikit-learn 文档中并修改它以适应您的问题。

作为RandomForestClassifier是 DecisionTreeClassifier 的集合我们可以遍历不同的树并检索每个树中样本的决策路径。希望对您有所帮助:

import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier

X, y = make_classification(n_samples=1000,
                           n_features=6,
                           n_informative=3,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

estimator = RandomForestClassifier(n_estimators=10,
                               random_state=0)
estimator.fit(X_train, y_train)

# The decision estimator has an attribute called tree_  which stores the entire
# tree structure and allows access to low level attributes. The binary tree
# tree_ is represented as a number of parallel arrays. The i-th element of each
# array holds information about the node `i`. Node 0 is the tree's root. NOTE:
# Some of the arrays only apply to either leaves or split nodes, resp. In this
# case the values of nodes of the other type are arbitrary!
#
# Among those arrays, we have:
#   - left_child, id of the left child of the node
#   - right_child, id of the right child of the node
#   - feature, feature used for splitting the node
#   - threshold, threshold value at the node
#

# Using those arrays, we can parse the tree structure:

#n_nodes = estimator.tree_.node_count
n_nodes_ = [t.tree_.node_count for t in estimator.estimators_]
children_left_ = [t.tree_.children_left for t in estimator.estimators_]
children_right_ = [t.tree_.children_right for t in estimator.estimators_]
feature_ = [t.tree_.feature for t in estimator.estimators_]
threshold_ = [t.tree_.threshold for t in estimator.estimators_]


def explore_tree(estimator, n_nodes, children_left,children_right, feature,threshold,
                suffix='', print_tree= False, sample_id=0, feature_names=None):

    if not feature_names:
        feature_names = feature


    assert len(feature_names) == X.shape[1], "The feature names do not match the number of features."
    # The tree structure can be traversed to compute various properties such
    # as the depth of each node and whether or not it is a leaf.
    node_depth = np.zeros(shape=n_nodes, dtype=np.int64)
    is_leaves = np.zeros(shape=n_nodes, dtype=bool)

    stack = [(0, -1)]  # seed is the root node id and its parent depth
    while len(stack) > 0:
        node_id, parent_depth = stack.pop()
        node_depth[node_id] = parent_depth + 1

        # If we have a test node
        if (children_left[node_id] != children_right[node_id]):
            stack.append((children_left[node_id], parent_depth + 1))
            stack.append((children_right[node_id], parent_depth + 1))
        else:
            is_leaves[node_id] = True

    print("The binary tree structure has %s nodes"
          % n_nodes)
    if print_tree:
        print("Tree structure: \n")
        for i in range(n_nodes):
            if is_leaves[i]:
                print("%snode=%s leaf node." % (node_depth[i] * "\t", i))
            else:
                print("%snode=%s test node: go to node %s if X[:, %s] <= %s else to "
                      "node %s."
                      % (node_depth[i] * "\t",
                         i,
                         children_left[i],
                         feature[i],
                         threshold[i],
                         children_right[i],
                         ))
            print("\n")
        print()

    # First let's retrieve the decision path of each sample. The decision_path
    # method allows to retrieve the node indicator functions. A non zero element of
    # indicator matrix at the position (i, j) indicates that the sample i goes
    # through the node j.

    node_indicator = estimator.decision_path(X_test)

    # Similarly, we can also have the leaves ids reached by each sample.

    leave_id = estimator.apply(X_test)

    # Now, it's possible to get the tests that were used to predict a sample or
    # a group of samples. First, let's make it for the sample.

    #sample_id = 0
    node_index = node_indicator.indices[node_indicator.indptr[sample_id]:
                                        node_indicator.indptr[sample_id + 1]]

    print(X_test[sample_id,:])

    print('Rules used to predict sample %s: ' % sample_id)
    for node_id in node_index:
        # tabulation = " "*node_depth[node_id] #-> makes tabulation of each level of the tree
        tabulation = ""
        if leave_id[sample_id] == node_id:
            print("%s==> Predicted leaf index \n"%(tabulation))
            #continue

        if (X_test[sample_id, feature[node_id]] <= threshold[node_id]):
            threshold_sign = "<="
        else:
            threshold_sign = ">"

        print("%sdecision id node %s : (X_test[%s, '%s'] (= %s) %s %s)"
              % (tabulation,
                 node_id,
                 sample_id,
                 feature_names[feature[node_id]],
                 X_test[sample_id, feature[node_id]],
                 threshold_sign,
                 threshold[node_id]))
    print("%sPrediction for sample %d: %s"%(tabulation,
                                          sample_id,
                                          estimator.predict(X_test)[sample_id]))

    # For a group of samples, we have the following common node.
    sample_ids = [sample_id, 1]
    common_nodes = (node_indicator.toarray()[sample_ids].sum(axis=0) ==
                    len(sample_ids))

    common_node_id = np.arange(n_nodes)[common_nodes]

    print("\nThe following samples %s share the node %s in the tree"
          % (sample_ids, common_node_id))
    print("It is %s %% of all nodes." % (100 * len(common_node_id) / n_nodes,))

    for sample_id_ in sample_ids:
        print("Prediction for sample %d: %s"%(sample_id_,
                                          estimator.predict(X_test)[sample_id_]))

为了在随机森林中打印不同的树，您可以通过这种方式迭代估算器:

for i,e in enumerate(estimator.estimators_):

    print("Tree %d\n"%i)
    explore_tree(estimator.estimators_[i],n_nodes_[i],children_left_[i],
                 children_right_[i], feature_[i],threshold_[i],
                suffix=i, sample_id=1, feature_names=["Feature_%d"%i for i in range(X.shape[1])])
    print('\n'*2)

这是 RandomForestClassifier 中 sample_id = 0 中第一棵树的输出:

Tree 1

The binary tree structure has 115 nodes
[ 2.36609963  1.32658511 -0.08002818  0.88295736  2.24224824 -0.71469736]
Rules used to predict sample 1: 
decision id node 0 : (X_test[1, 'Feature_3'] (= 0.8829573603562209) > 0.7038955688476562)
decision id node 86 : (X_test[1, 'Feature_2'] (= -0.08002817952064323) > -1.4465678930282593)
decision id node 92 : (X_test[1, 'Feature_0'] (= 2.366099632530947) > 0.7020512223243713)
decision id node 102 : (X_test[1, 'Feature_5'] (= -0.7146973587899221) > -1.2842652797698975)
decision id node 106 : (X_test[1, 'Feature_2'] (= -0.08002817952064323) > -0.4031955599784851)
decision id node 110 : (X_test[1, 'Feature_0'] (= 2.366099632530947) > 0.717217206954956)
decision id node 112 : (X_test[1, 'Feature_4'] (= 2.2422482391211678) <= 3.0181679725646973)
==> Predicted leaf index
decision id node 113 : (X_test[1, 'Feature_4'] (= 2.2422482391211678) > -2.0)
Prediction for sample 1: 1.0

The following samples [1, 1] share the node [  0  86  92 102 106 110 112 113] in the tree
It is 6.956521739130435 % of all nodes.
Prediction for sample 1: 1.0
Prediction for sample 1: 1.0



Tree 2

The binary tree structure has 135 nodes
[ 2.36609963  1.32658511 -0.08002818  0.88295736  2.24224824 -0.71469736]
Rules used to predict sample 1: 
decision id node 0 : (X_test[1, 'Feature_3'] (= 0.8829573603562209) > 0.5484486818313599)
decision id node 88 : (X_test[1, 'Feature_2'] (= -0.08002817952064323) > -0.7239605188369751)
decision id node 102 : (X_test[1, 'Feature_5'] (= -0.7146973587899221) > -1.6143207550048828)
decision id node 110 : (X_test[1, 'Feature_0'] (= 2.366099632530947) > 2.3399271965026855)
decision id node 130 : (X_test[1, 'Feature_5'] (= -0.7146973587899221) <= -0.5680553913116455)
decision id node 131 : (X_test[1, 'Feature_0'] (= 2.366099632530947) <= 2.4545814990997314)
==> Predicted leaf index
decision id node 132 : (X_test[1, 'Feature_4'] (= 2.2422482391211678) > -2.0)
Prediction for sample 1: 0.0

The following samples [1, 1] share the node [  0  88 102 110 130 131 132] in the tree
It is 5.185185185185185 % of all nodes.
Prediction for sample 1: 0.0
Prediction for sample 1: 0.0

关于python - 打印随机森林分类器中特定样本的决策路径，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48880557/

文章推荐： c# - 颠覆协议(protocol)

文章推荐： c# - Winform 或 WPF MVC

文章推荐： c# - Linq-To-SQL 中的 Hacker News 风格排序算法

C# Dll注入(inject)器，VB.Net Dll注入(inject)器
我之前让 dll 注入(inject)器变得简单，但我有 Windows 7，我用 C# 和 C++ 做了它，它工作得很好!但是现在当我在 Windows 8 中尝试相同的代码时，它似乎没有以正确的方
javascript - Polymer 1.0 尝试制作一种类似于核心 split 器的 split 器，可以称为铁 split 器
我正在尝试制作一个名为 core-splitter 的元素，该元素在 1.0 中已弃用，因为它在我们的项目中起着关键作用。如果您不知道 core-splitter 的作用，我可以提供一个简短的描述。
scrapy - 在scrapy的同一进程中运行多个蜘蛛后如何停止 react 器？
我有几个不同的蜘蛛，想一次运行所有它们。基于 this和 this ，我可以在同一个进程中运行多个蜘蛛。但是，我不知道如何设计一个信号系统来在所有蜘蛛都完成后停止 react 器。我试过了: cra
twisted - 在某个条件下停止扭曲 react 器
有没有办法在达到特定条件时停止扭曲 react 器。例如，如果一个变量被设置为某个值，那么 react 器应该停止吗？最佳答案理想情况下，您不会将变量设置为一个值并停止 react 器，而是调用
javascript - 我如何定义我的应用程序的注入(inject)器
https://code.angularjs.org/1.0.0rc9/angular-1.0.0rc9.js 上面的链接定义了外部js文件，我不知道Angular-1.0.0rc9.js的注入(in
angularjs - 如何为我的应用程序检索注入(inject)器？
我正在尝试运行一个函数并将服务注入(inject)其中。我认为这可以使用 $injector 轻松完成.所以我尝试了以下(简化示例): angular.injector().invoke( [ "$q
gwt - 使用多个抽象模块实例化一个注入(inject)器
在 google Guice 中，我可以使用函数 createInjector 创建基于多个模块的注入(inject)器。因为我使用 GWT.create 在 GoogleGin 中实例化注入(in
c# - 属性的自定义配置绑定(bind)器
我在 ASP.NET Core 1.1 解决方案中使用配置绑定(bind)。基本上，我在“ConfigureServices Startup”部分中有一些用于绑定(bind)的简单代码，如下所示: s
java - Spring初始化绑定(bind)器
我在 Spring MVC 中设置 initBinder 时遇到一些问题。我有一个 ModelAttribute，它有一个有时会显示的字段。 public class Model { privat
jquery post表单数据和MVC模型绑定(bind)器
我正在尝试通过jquery post发布knockoutjs View 模型 var $form = $('#barcodeTemplate form'); var data = ko.toJS(vm
c# - 具有多态对象集合的复杂模型的自定义模型绑定(bind)器
如何为包含多态对象集合的复杂模型编写自定义模型绑定(bind)程序？我有下一个模型结构: public class CustomAttributeValueViewModel { publi
c# - 使用多个构造函数注册开放泛型的简单注入(inject)器
您好，我正在尝试实现我在 this article 中找到的扩展方法对于简单的注入(inject)器，因为它不支持开箱即用的特定构造函数的注册。根据这篇文章，我需要用一个假的委托(delegate)
c# - 注册动态类型的简单注入(inject)器
你好，我想自动注册我的依赖项。我现在拥有的是: public interface IRepository where T : class public interface IFolderReposi
javascript - 带有位置服务的angularjs注入(inject)器
我正在使用 Jasmine 测试一些 Angular.js 代码。为此，我需要一个 Angular 注入(inject)器: var injector = angular.injector(['ng'
C 代码 reshape 器
我正在使用 Matlab 代码生成器。不可能包含代码风格指南。这就是为什么我正在寻找一个工具来“ reshape ”、重命名和重新格式化生成的代码，根据我的: 功能横幅约定文件横幅约定命名约定等
c++ - 与模板模板类一起使用的自定义模板参数绑定(bind)器
这个问题在这里已经有了答案: Where and why do I have to put the "template" and "typename" keywords? (8 个答案) 关闭 8
c++ - 开源dll注入(inject)器
我开发了一种工具，可以更改某些程序的外观。为此，我需要在某些进程中注入(inject)一个 dll。现在我基本上使用这个 approach .问题通常是人们无法注入(inject) dll，因为他们
java - 是否有使用方面和注释的数据绑定(bind)器？
我想使用 swing、spring 和 hibernate 编写一个 java 应用程序。我想使用数据绑定(bind)器用 bean 的值填充 gui，并且我还希望它反射(reflect) gui
python - 当两个蜘蛛都完成时如何停止 react 器
我有这段代码，当两个蜘蛛完成后，程序仍在运行。 #!C:\Python27\python.exe from twisted.internet import reactor from scrapy.cr
java - 我如何才能限定我不使用的 Autowiring 器 "own"
要点是 Spring Batch (v2) 测试框架具有带有 @Autowired 注释的 JobLauncherTestUtils.setJob。我们的测试套件有多个 Job 类提供者。因为这个类不

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 打印随机森林分类器中特定样本的决策路径