gpt4 book ai didi

python - 如何为 TfidfVectorizer 使用列表列表或集合列表?

转载 作者:太空宇宙 更新时间:2023-11-04 00:16:49 24 4
gpt4 key购买 nike

我正在使用 sklearn TfidfVectorizer 进行文本分类。

我知道这个矢量化器需要原始文本作为输入,但使用列表是可行的(请参阅 input1)。

但是,如果我想使用多个列表(或集合),我会收到以下属性错误。

有谁知道如何解决这个问题?提前致谢!

    from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(min_df=1, stop_words="english")
input1 = ["This", "is", "a", "test"]
input2 = [["This", "is", "a", "test"], ["It", "is", "raining", "today"]]

print(vectorizer.fit_transform(input1)) #works
print(vectorizer.fit_transform(input2)) #gives Attribute error

input 1:
(3, 0) 1.0

input 2:

Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 1381, in fit_transform X = super(TfidfVectorizer, self).fit_transform(raw_documents) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 869, in fit_transform self.fixed_vocabulary_) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 792, in _count_vocab for feature in analyze(doc): File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 266, in tokenize(preprocess(self.decode(doc))), stop_words) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 232, in return lambda x: strip_accents(x.lower()) AttributeError: 'list' object has no attribute 'lower'

最佳答案

请注意,input1 有效,但它将列表(字符串)的每个元素视为要矢量化的不同文档。

在 input2 的情况下,我假设您想要向量化每个“句子”(子列表)。一种解决方案是使用以下列表理解语法:

input2_corrected = [" ".join(x) for x in input2]

产生

['This is a test', 'It is raining today']

不再产生 AttributeError。

关于python - 如何为 TfidfVectorizer 使用列表列表或集合列表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50633153/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com