gpt4 book ai didi

python - 计数向量化器() : StreamBackedCorpusView' object has no attribute 'lower'

转载 作者:行者123 更新时间:2023-12-01 02:37:52 25 4
gpt4 key购买 nike

我正在尝试使用以下代码在 NLTK 电影评论语料库上运行并实例化 CountVectorizer():

>>>import nltk
>>>import nltk.corpus
>>>from sklearn.feature_extraction.text import CountVectorizer
>>>from nltk.corpus import movie_reviews
>>>neg_rev = movie_reviews.fileids('neg')
>>>pos_rev = movie_reviews.fileids('pos')
>>>rev_list = [] # Empty List
>>>for rev in neg_rev:
rev_list.append(nltk.corpus.movie_reviews.words(rev))
>>>for rev_pos in pos_rev:
rev_list.append(nltk.corpus.movie_reviews.words(rev_pos))
>>>count_vect = CountVectorizer()
>>>X_count_vect = count_vect.fit_transform(rev_list)

我收到以下错误:

AttributeError                            Traceback (most recent call last)
<ipython-input-37-00e9047daa67> in <module>()
----> 1 X_count_vect = count_vect.fit_transform(rev_list)

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y)
837
838 vocabulary, X = self._count_vocab(raw_documents,
--> 839 self.fixed_vocabulary_)
840
841 if self.binary:

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in _count_vocab(self, raw_documents, fixed_vocab)
760 for doc in raw_documents:
761 feature_counter = {}
--> 762 for feature in analyze(doc):
763 try:
764 feature_idx = vocabulary[feature]

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in <lambda>(doc)
239
240 return lambda doc: self._word_ngrams(
--> 241 tokenize(preprocess(self.decode(doc))), stop_words)
242
243 else:

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in <lambda>(x)
205
206 if self.lowercase:
--> 207 return lambda x: strip_accents(x.lower())
208 else:
209 return strip_accents

AttributeError: 'StreamBackedCorpusView' object has no attribute 'lower'

nltk.corpus.movi​​e_reviews.words(rev_pos) 具有标记化句子......例如:

['films', 'adapted', 'from', 'comic', 'books', 'have', ...]

谁能告诉我我做错了什么吗?我假设我在创建电影评论 (rev_list) 列表时错过了一些步骤。

TIA

最佳答案

看起来你的 .words() 函数实际上并没有返回一个标记列表,而是返回了一系列 StreamBackedCorpusView 类。此类允许您检索 token ,但实际上并不是 token 本身的完整表示。

但是,您可以从 View 中检索 token 。有关使用 StreamBackCorpusView 的更多详细信息,请参阅以下链接。

http://nltk.sourceforge.net/corpusview/corpusview.StreamBackedCorpusView-class.html

关于python - 计数向量化器() : StreamBackedCorpusView' object has no attribute 'lower' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46034861/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com