gpt4 book ai didi

python TfidfVectorizer 给出 typeError : expected string or bytes-like object on csv file

转载 作者:太空宇宙 更新时间:2023-11-04 02:50:46 25 4
gpt4 key购买 nike

我正在分析一个非常大的 csv 文件,并尝试使用 scikit 从中提取 tf-idf 信息。不幸的是,我从未完成数据处理,因为它抛出了这个类型错误。有没有办法以编程方式更改 csv 文件以消除此错误?这是我的代码:

    df = pd.read_csv("C:/Users/aidan/Downloads/papers/papers.csv", sep = None)
df = df[pd.notnull(df)]

n_features = 1000
n_topics = 8
n_top_words = 10
tfidf_vectorizer = TfidfVectorizer(max_df=0.95, min_df=2,max_features=n_features,stop_words='english', lowercase = False)

tfidf = tfidf_vectorizer.fit_transform(df['paper_text'])

错误是从最后一行开始的。提前致谢!

Traceback (most recent call last):
File "C:\Users\aidan\NIPS Analysis 2.0.py", line 35, in <module>
tfidf = tfidf_vectorizer.fit_transform(df['paper_text'])
File "c:\python\python36\lib\site-packages\sklearn\feature_extraction\text.py", line 1352, in fit_transform
X = super(TfidfVectorizer, self).fit_transform(raw_documents)
File "c:\python\python36\lib\site-packages\sklearn\feature_extraction\text.py", line 839, in fit_transform
self.fixed_vocabulary_)
File "c:\python\python36\lib\site-packages\sklearn\feature_extraction\text.py", line 762, in _count_vocab
for feature in analyze(doc):
File "c:\python\python36\lib\site-packages\sklearn\feature_extraction\text.py", line 241, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "c:\python\python36\lib\site-packages\sklearn\feature_extraction\text.py", line 216, in <lambda>
return lambda doc: token_pattern.findall(doc)
TypeError: expected string or bytes-like object

最佳答案

你检查过df.dtypes了吗?输出结果是什么?

您可以尝试将 dtype=str 添加为 .read_csv() 调用的参数。

关于python TfidfVectorizer 给出 typeError : expected string or bytes-like object on csv file,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43946259/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com