gpt4 book ai didi

python - NoneType' 对象对于 Vectorizer sklearn 不可迭代

转载 作者:太空宇宙 更新时间:2023-11-03 21:39:49 24 4
gpt4 key购买 nike

我已将文本数据导入到 pandas 数据框中。我想实现矢量化器。所以我用sklearn来做tfidf等等

所以我做了第一步。清理文本。

#Creating Function
from nltk.corpus import stopwords
def text_process(sms):
nonpunc = [char for char in sms if char not in string.punctuation]
nonpunc = ''.join(nonpunc)
return[word for word in nonpunc.split() if word.lower() not in stopwords.words('english')]

下一步

data['sms'].head(5).apply(text_process)

下一步

from sklearn.feature_extraction.text import  CountVectorizer
bow_transformer = CountVectorizer(analyzer = text_process).fit(data['sms'])

我收到一个错误。

  ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-84-f1812582c7e1> in <module>
1 #Step 1
2 from sklearn.feature_extraction.text import CountVectorizer
----> 3 bow_transformer = CountVectorizer(analyzer = text_process).fit(data['sms'])

~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit(self, raw_documents, y)
976 self
977 """
--> 978 self.fit_transform(raw_documents)
979 return self
980

~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y)
1010
1011 vocabulary, X = self._count_vocab(raw_documents,
-> 1012 self.fixed_vocabulary_)
1013
1014 if self.binary:

~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in _count_vocab(self, raw_documents, fixed_vocab)
920 for doc in raw_documents:
921 feature_counter = {}
--> 922 for feature in analyze(doc):
923 try:
924 feature_idx = vocabulary[feature]

<ipython-input-82-4149ae75d7bf> in text_process(sms)
3 def text_process(sms):
4
----> 5 nonpunc = [char for char in sms if char not in string.punctuation]
6 nonpunc = ''.join(nonpunc)
7 return[word for word in nonpunc.split() if word.lower() not in stopwords.words('english')]

TypeError: 'NoneType' object is not iterable

最佳答案

我的数据中有 NAN 值。我使用了正则表达式,这会导致删除所有数据。

关于python - NoneType' 对象对于 Vectorizer sklearn 不可迭代,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52955249/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com