gpt4 book ai didi

python - 如何从管道中提取词汇

转载 作者:太空狗 更新时间:2023-10-30 00:01:22 28 4
gpt4 key购买 nike

我可以通过以下方式从 CountVecotizerModel 中提取词汇

fl = StopWordsRemover(inputCol="words", outputCol="filtered")
df = fl.transform(df)
cv = CountVectorizer(inputCol="filtered", outputCol="rawFeatures")
model = cv.fit(df)

print(model.vocabulary)

上面的代码将打印带有索引的词汇列表。

现在我已经创建了上述代码的管道,如下所示:

rm_stop_words = StopWordsRemover(inputCol="words", outputCol="filtered")
count_freq = CountVectorizer(inputCol=rm_stop_words.getOutputCol(), outputCol="rawFeatures")

pipeline = Pipeline(stages=[rm_stop_words, count_freq])
model = pipeline.fit(dfm)
df = model.transform(dfm)

print(model.vocabulary) # This won't work as it's not CountVectorizerModel

会抛出如下错误

print(len(model.vocabulary))

AttributeError: 'PipelineModel' object has no attribute 'vocabulary'

那么如何从管道中提取模型属性呢?

最佳答案

与任何其他阶段属性一样,提取 stages:

stages = model.stages

找到你感兴趣的(-s):

from pyspark.ml.feature import CountVectorizerModel

vectorizers = [s for s in stages if isinstance(s, CountVectorizerModel)]

并获取所需的字段:

[v.vocabulary for v in vectorizers]

关于python - 如何从管道中提取词汇,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46715559/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com