gpt4 book ai didi

python - 将python scikit学习模型导出到pmml

转载 作者:行者123 更新时间:2023-12-03 14:13:54 25 4
gpt4 key购买 nike

我想将python scikit-learn模型导出到PMML。

哪个python软件包最合适?

我读到有关Augustus的信息,但是我无法使用scikit-learn模型找到任何示例。

最佳答案

SkLearn2PMML

a thin wrapper around the JPMML-SkLearn command-line application. For a list of supported Scikit-Learn Estimator and Transformer types, please refer to the documentation of the JPMML-SkLearn project.



如@ user1808924所述,它支持Python 2.7或3.4+。它还需要Java 1.7+

通过以下方式安装:(需要 git)
pip install git+https://github.com/jpmml/sklearn2pmml.git

如何将分类器树导出到PMML的示例。 首先种树:
# example tree & viz from http://scikit-learn.org/stable/modules/tree.html
from sklearn import datasets, tree
iris = datasets.load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)

SkLearn2PMML转换分为两个部分:一个估计器(我们的 clf)和一个映射器(用于诸如离散化或PCA之类的预处理步骤)。我们的映射器非常基础,因为我们没有进行任何转换。
from sklearn_pandas import DataFrameMapper
default_mapper = DataFrameMapper([(i, None) for i in iris.feature_names + ['Species']])

from sklearn2pmml import sklearn2pmml
sklearn2pmml(estimator=clf,
mapper=default_mapper,
pmml="D:/workspace/IrisClassificationTree.pmml")

可以(尽管没有记录)传递 mapper=None,但是您会看到预测变量名称丢失了(返回 x1而不是 sepal length等)。

让我们看一下 .pmml文件:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3">
<Header>
<Application name="JPMML-SkLearn" version="1.1.1"/>
<Timestamp>2016-09-26T19:21:43Z</Timestamp>
</Header>
<DataDictionary>
<DataField name="sepal length (cm)" optype="continuous" dataType="float"/>
<DataField name="sepal width (cm)" optype="continuous" dataType="float"/>
<DataField name="petal length (cm)" optype="continuous" dataType="float"/>
<DataField name="petal width (cm)" optype="continuous" dataType="float"/>
<DataField name="Species" optype="categorical" dataType="string">
<Value value="setosa"/>
<Value value="versicolor"/>
<Value value="virginica"/>
</DataField>
</DataDictionary>
<TreeModel functionName="classification" splitCharacteristic="binarySplit">
<MiningSchema>
<MiningField name="Species" usageType="target"/>
<MiningField name="sepal length (cm)"/>
<MiningField name="sepal width (cm)"/>
<MiningField name="petal length (cm)"/>
<MiningField name="petal width (cm)"/>
</MiningSchema>
<Output>
<OutputField name="probability_setosa" dataType="double" feature="probability" value="setosa"/>
<OutputField name="probability_versicolor" dataType="double" feature="probability" value="versicolor"/>
<OutputField name="probability_virginica" dataType="double" feature="probability" value="virginica"/>
</Output>
<Node id="1">
<True/>
<Node id="2" score="setosa" recordCount="50.0">
<SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="0.8"/>
<ScoreDistribution value="setosa" recordCount="50.0"/>
<ScoreDistribution value="versicolor" recordCount="0.0"/>
<ScoreDistribution value="virginica" recordCount="0.0"/>
</Node>
<Node id="3">
<SimplePredicate field="petal width (cm)" operator="greaterThan" value="0.8"/>
<Node id="4">
<SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.75"/>
<Node id="5">
<SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.95"/>
<Node id="6" score="versicolor" recordCount="47.0">
<SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.6500001"/>
<ScoreDistribution value="setosa" recordCount="0.0"/>
<ScoreDistribution value="versicolor" recordCount="47.0"/>
<ScoreDistribution value="virginica" recordCount="0.0"/>
</Node>
<Node id="7" score="virginica" recordCount="1.0">
<SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.6500001"/>
<ScoreDistribution value="setosa" recordCount="0.0"/>
<ScoreDistribution value="versicolor" recordCount="0.0"/>
<ScoreDistribution value="virginica" recordCount="1.0"/>
</Node>
</Node>
<Node id="8">
<SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.95"/>
<Node id="9" score="virginica" recordCount="3.0">
<SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.55"/>
<ScoreDistribution value="setosa" recordCount="0.0"/>
<ScoreDistribution value="versicolor" recordCount="0.0"/>
<ScoreDistribution value="virginica" recordCount="3.0"/>
</Node>
<Node id="10">
<SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.55"/>
<Node id="11" score="versicolor" recordCount="2.0">
<SimplePredicate field="sepal length (cm)" operator="lessOrEqual" value="6.95"/>
<ScoreDistribution value="setosa" recordCount="0.0"/>
<ScoreDistribution value="versicolor" recordCount="2.0"/>
<ScoreDistribution value="virginica" recordCount="0.0"/>
</Node>
<Node id="12" score="virginica" recordCount="1.0">
<SimplePredicate field="sepal length (cm)" operator="greaterThan" value="6.95"/>
<ScoreDistribution value="setosa" recordCount="0.0"/>
<ScoreDistribution value="versicolor" recordCount="0.0"/>
<ScoreDistribution value="virginica" recordCount="1.0"/>
</Node>
</Node>
</Node>
</Node>
<Node id="13">
<SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.75"/>
<Node id="14">
<SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.8500004"/>
<Node id="15" score="virginica" recordCount="2.0">
<SimplePredicate field="sepal width (cm)" operator="lessOrEqual" value="3.1"/>
<ScoreDistribution value="setosa" recordCount="0.0"/>
<ScoreDistribution value="versicolor" recordCount="0.0"/>
<ScoreDistribution value="virginica" recordCount="2.0"/>
</Node>
<Node id="16" score="versicolor" recordCount="1.0">
<SimplePredicate field="sepal width (cm)" operator="greaterThan" value="3.1"/>
<ScoreDistribution value="setosa" recordCount="0.0"/>
<ScoreDistribution value="versicolor" recordCount="1.0"/>
<ScoreDistribution value="virginica" recordCount="0.0"/>
</Node>
</Node>
<Node id="17" score="virginica" recordCount="43.0">
<SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.8500004"/>
<ScoreDistribution value="setosa" recordCount="0.0"/>
<ScoreDistribution value="versicolor" recordCount="0.0"/>
<ScoreDistribution value="virginica" recordCount="43.0"/>
</Node>
</Node>
</Node>
</Node>
</TreeModel>
</PMML>

第一个分割(节点1)的花瓣宽度为0.8。节点2(花瓣宽度<= 0.8)捕获了所有刚毛,没有其他东西。

您可以将pmml输出与 graphviz输出进行比较:
from sklearn.externals.six import StringIO
import pydotplus # this might be pydot for python 2.7
dot_data = StringIO()
tree.export_graphviz(clf,
out_file=dot_data,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("D:/workspace/iris.pdf")
# for in-line display, you can also do:
# from IPython.display import Image
# Image(graph.create_png())

enter image description here

关于python - 将python scikit学习模型导出到pmml,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33221331/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com