gpt4 book ai didi

python - 以管道方式输出决策树

转载 作者:太空宇宙 更新时间:2023-11-03 20:23:11 24 4
gpt4 key购买 nike

您好,因为我是使用 sklearn 库的机器学习方法的新手,我尝试将决策树合并到管道中,然后进行模型的预测和输出,但是当我运行以下代码时,我收到警告:

“Pipeline”对象没有属性“tree_”

所以我想知道管道是否不支持树输出,我该如何解决这个问题?我也尝试过直接使用 Decision_tree 类,但我收到另一个警告:使用序列设置数组元素。我知道这似乎是因为我有不同维度的向量,但仍然不知道如何处理这种情况。

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.pipeline import Pipeline

from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree.export import export_text
from sklearn import tree


# a function that reads the corpus, tokenizes it and returns the documents
# and their labels
def read_corpus(corpus_file, use_sentiment):
documents = []
labels = []
with open(corpus_file, encoding='utf-8') as f:
for line in f:
tokens = line.strip().split()

documents.append(tokens[3:])

if use_sentiment:
# 2-class problem: positive vs negative
labels.append( tokens[1] )
else:
# 6-class problem: books, camera, dvd, health, music, software
labels.append( tokens[0] )

return documents, labels

# a dummy function that just returns its input
def identity(x):
return x

# read the data and split i into train and test
X, Y = read_corpus('/Users/dengchenglong/Downloads/trainset', use_sentiment=False)
split_point = int(0.75*len(X))
Xtrain = X[:split_point]
Ytrain = Y[:split_point]
Xtest = X[split_point:]
Ytest = Y[split_point:]

# let's use the TF-IDF vectorizer
tfidf = False

# we use a dummy function as tokenizer and preprocessor,
# since the texts are already preprocessed and tokenized.
if tfidf:
vec = TfidfVectorizer(preprocessor = identity,
tokenizer = identity)
else:
vec = CountVectorizer(preprocessor = identity,
tokenizer = identity)


# combine the vectorizer with a Naive Bayes classifier
classifier = Pipeline( [('vec', vec),
('cls', tree.DecisionTreeClassifier())])


# train the classifier on the train dataset
decision_tree = classifier.fit(Xtrain, Ytrain)


# predict the labels of the test data
Yguess = classifier.predict(Xtest)
tree.plot_tree(classifier.fit(Xtest, Ytest))
# report performance of the classifier
print(accuracy_score(Ytest, Yguess))
print(classification_report(Ytest, Yguess))

最佳答案

如果你尝试这样做会怎样:

from sklearn.pipeline import make_pipeline

# combine the vectorizer with a Naive Bayes classifier
clf = DecisionTreeClassifier()
classifier = make_pipeline(vec,clf)

看起来,在使用管道之前,您必须启动您要应用的模型。让我知道这是否有效,如果无效,它返回的错误。来自:Scikit-learn documentation示例如下:Make pipeline example with trees

关于python - 以管道方式输出决策树,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58030693/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com