gpt4 book ai didi

python - 如何在 LDA 中查看每个主题的所有文档?

转载 作者:行者123 更新时间:2023-12-01 09:10:30 25 4
gpt4 key购买 nike

我正在使用 LDA 来了解一篇精彩文本的主题。我设法打印了主题,但我想打印包含您的主题的每个文本。

数据:

it's very hot outside summer
there are not many flowers in winter
in the winter we eat hot food
in the summer we go to the sea
in winter we used many clothes
in summer we are on vacation
winter and summer are two seasons of the year

我尝试使用 sklearn 并且可以打印主题,但我想打印属于每个主题的所有短语

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import numpy as np
import pandas

dataset = pandas.read_csv('data.csv', encoding = 'utf-8')
comments = dataset['comments']
comments_list = comments.values.tolist()

vect = CountVectorizer()
X = vect.fit_transform(comments_list)

lda = LatentDirichletAllocation(n_topics = 2, learning_method = "batch", max_iter = 25, random_state = 0)

document_topics = lda.fit_transform(X)

sorting = np.argsort(lda.components_, axis = 1)[:, ::-1]
feature_names = np.array(vect.get_feature_names())

docs = np.argsort(comments_list[:, 1])[::-1]
for i in docs[:4]:
print(' '.join(i) + '\n')

良好的输出:

Topic 1
it's very hot outside summer
in the summer we go to the sea
in summer we are on vacation
winter and summer are two seasons of the year

Topic 2
there are not many flowers in winter
in the winter we eat hot food
in winter we used many clothes
winter and summer are two seasons of the year

最佳答案

如果要打印文档,则需要指定它们。

print(" ".join(comments_list[i].split(",")[:2]) + "\n")

关于python - 如何在 LDA 中查看每个主题的所有文档?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51694637/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com