gpt4 book ai didi

python - DecisionTreeClassifier 中两片叶子之间的距离

转载 作者:太空狗 更新时间:2023-10-30 02:36:31 25 4
gpt4 key购买 nike

有没有办法计算 decision tree 中两片叶子之间的距离? .

距离是指从一片叶子到另一片叶子的节点数。

graph

例如,在此示例图中:

distance(leaf1, leaf2) == 1
distance(leaf1, leaf3) == 3
distance(leaf1, leaf4) == 4

感谢您的帮助!

最佳答案

依赖于额外 Python 包的示例,即 networkxpydot .出于这个原因,该解决方案得到了慷慨的评论。该问题带有 scikit-learn 标记,因此解决方案以 Python 呈现。

一些数据和一个通用的DecisionTreeClassifier:

# load example data and classifier
from sklearn.datasets import load_wine
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# for determining distance
from sklearn import tree
import networkx as nx
import pydot

# load data and fit a DecisionTreeClassifier
X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
clf = DecisionTreeClassifier(max_depth=3, random_state=42)
clf.fit(X_train, y_train);

此函数使用 tree.export_graphviz 将拟合 DecisionTreeClassifier 转换为 networkx 无向 MultiGraph , pydot.graph_from_dot_data , nx.drawing.nx_pydot.from_pdyot , 和 nx.to_undirected .

def dt_to_mg(clf):
"""convert a fit DecisionTreeClassifier to a Networkx undirected MultiGraph"""
# export the classifier to a string DOT format
dot_data = tree.export_graphviz(clf)
# Use pydot to convert the dot data to a graph
dot_graph = pydot.graph_from_dot_data(dot_data)[0]
# Import the graph data into Networkx
MG = nx.drawing.nx_pydot.from_pydot(dot_graph)
# Convert the tree to an undirected Networkx Graph
uMG = MG.to_undirected()
return uMG

uMG = dt_to_mg(clf)

使用nx.shortest_path_length找到树中任意两个节点之间的距离。

# get leaves
leaves = set(str(x) for x in clf.apply(X))
print(leaves)
{'10', '7', '9', '5', '3', '4'}

# find the distance for two leaves
print(nx.shortest_path_length(uMG, source='9', target='5'))
5

# undirected graph means this should also work
print(nx.shortest_path_length(uMG, source='5', target='9'))
5

shortest_path_length 返回 sourcetarget 之间的边数。这不是 OP 要求的距离度量。我认为它们之间的节点数应该是 n_edges - 1

print(nx.shortest_path_length(uMG, source='5', target='9') - 1)
4

或者找到所有叶子的距离并将它们存储在字典或其他一些有用的对象中以供下游计算。

from itertools import combinations
leaf_distance_edges = {}
leaf_distance_nodes = {}
for leaf1, leaf2 in combinations(leaves, 2):
d = nx.shortest_path_length(uMG, source=leaf1, target=leaf2)
leaf_distance_edges[(leaf1, leaf2)] = d
leaf_distance_nodes[(leaf1, leaf2)] = d - 1

leaf_distance_nodes
{('4', '9'): 5,
('4', '5'): 2,
('4', '10'): 5,
('4', '7'): 4,
('4', '3'): 1,
('9', '5'): 4,
('9', '10'): 1,
('9', '7'): 2,
('9', '3'): 5,
('5', '10'): 4,
('5', '7'): 3,
('5', '3'): 2,
('10', '7'): 2,
('10', '3'): 5,
('7', '3'): 4}

关于python - DecisionTreeClassifier 中两片叶子之间的距离,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53618651/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com