gpt4 book ai didi

python - 如何分析sklearn-pipeline的中间步骤?

转载 作者:行者123 更新时间:2023-11-30 09:16:40 25 4
gpt4 key购买 nike

我正在使用 sklearn 将文本分类。我正在使用 CountVectorizer 和 TFIDFTransformer 创建稀疏矩阵。

我正在自定义 tokenize_and_stem 中对字符串执行几个预处理步骤CountVectorizer 分词器中使用的函数。

from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer

SVM = Pipeline([('vect', CountVectorizer(max_features=100000,\
ngram_range= (1, 2),stop_words='english',tokenizer=tokenize_and_stem)),\
('tfidf', TfidfTransformer(use_idf= True)),\
('clf-svm', LinearSVC(C=1)),])

我的问题是,是否有任何简单的方法可以查看/存储 Pipeline 步骤 1/2 的输出来分析哪种数组将进入 svm?

最佳答案

您可以通过类似的方式获得中间步骤输出。

基于source code :

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline

pipeline = Pipeline([('vect', TfidfVectorizer(ngram_range= (1, 2),stop_words='english')),\
('clf-svm', LinearSVC(C=1)),])
X= ["I want to test this document", "let us see how it works", "I am okay and you ?"]

pipeline.fit(X,[0,1,1])

print(pipeline.named_steps['vect'].get_feature_names())

['document', 'let', 'let works', 'okay', 'test', 'test document', 'want', 'want test', 'works']

#Here is where you can get the output of intermediate steps
Xt = X

for name, transform in pipeline.steps[:-1]:
if transform is not None:
Xt = transform.transform(Xt)

print(Xt)



(0, 7) 0.4472135954999579
(0, 6) 0.4472135954999579
(0, 5) 0.4472135954999579
(0, 4) 0.4472135954999579
(0, 0) 0.4472135954999579
(1, 8) 0.5773502691896257
(1, 2) 0.5773502691896257
(1, 1) 0.5773502691896257
(2, 3) 1.0

关于python - 如何分析sklearn-pipeline的中间步骤?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54332654/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com