gpt4 book ai didi

python - 使用 SciLearn Kit 读取 Pandas 数据框时遇到问题

转载 作者:太空宇宙 更新时间:2023-11-04 04:24:46 25 4
gpt4 key购买 nike

我是 Python 新手,在使用 Pandas 创建的数据帧上使用 SciLearn Kit 时遇到了问题。下面是代码:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as plt
import json
%matplotlib inline

data = pd.read_json('C:/Users/Desktop/Machine Learning/yelp_academic_dataset_business.json', lines=True, orient='columns', encoding='utf-8')
dataframe = pd.DataFrame(data)

list(dataframe)
subset_data = dataframe.loc[(dataframe.city == 'Toronto')]
print(subset_data)
documents = subset_data.to_dict('records')

from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer

no_features = 1000

# NMF is able to use tf-idf
tfidf_vectorizer = TfidfVectorizer(max_df=0.95, min_df=2, max_features=no_features, stop_words='english')
tfidf = tfidf_vectorizer.fit_transform(documents)
tfidf_feature_names = tfidf_vectorizer.get_feature_names()

# LDA can only use raw term counts for LDA because it is a probabilistic graphical model
tf_vectorizer = CountVectorizer(max_df=0.95, min_df=2, max_features=no_features, stop_words='english')
tf = tf_vectorizer.fit_transform(documents)
tf_feature_names = tf_vectorizer.get_feature_names()

下面是我得到的错误。

AttributeError: 'dict' object has no attribute 'lower'

数据集可在此处获得:kaggle.com/yelp-dataset/yelp-dataset数据集:yelp_academic_dataset_business.json

任何帮助将不胜感激。谢谢。

最佳答案

如@Jarad 所述,您必须将 listseries 提供给 tfidf_vectorizer。因此,解决您的问题是

tfidf = tfidf_vectorizer.fit_transform(subset_data[records])

关于python - 使用 SciLearn Kit 读取 Pandas 数据框时遇到问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53716439/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com