gpt4 book ai didi

python - 使用 scikit learn DictVectorizer 对特定列进行矢量化时出现问题?

转载 作者:行者123 更新时间:2023-11-30 09:12:34 26 4
gpt4 key购买 nike

我想了解如何做一个简单的预测任务,我正在玩这个dataset ,也就是here以不同的格式。这是关于学生在某些类(class)中的表现,我想对数据集的某些列进行矢量化,以便不使用所有数据(只是为了了解它是如何工作的)。所以我尝试了以下操作,使用 DictVectorizer :

import pandas as pd
from sklearn.feature_extraction import DictVectorizer

training_data = pd.read_csv('/Users/user/Downloads/student/student-mat.csv')

dict_vect = DictVectorizer(sparse=False)

training_matrix = dict_vect.fit_transform(training_data['G1','G2','sex','school','age'])
training_matrix.toarray()

然后我想传递另一个功能行,如下所示:

testing_data = pd.read_csv('/Users/user/Downloads/student/student-mat_test.csv')
test_matrix = dict_vect.transform(testing_data['G1','G2','sex','school','age'])

问题是我得到以下回溯:

/usr/local/Cellar/python/2.7.8_1/Frameworks/Python.framework/Versions/2.7/bin/python2.7 school_2.py
Traceback (most recent call last):
File "/Users/user/PycharmProjects/PAN-pruebas/escuela_2.py", line 14, in <module>
X = dict_vect.fit_transform(df['sex','age','address','G1','G2'].values)
File "school_2.py", line 1787, in __getitem__
return self._getitem_column(key)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 1794, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python2.7/site-packages/pandas/core/generic.py", line 1079, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 2843, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python2.7/site-packages/pandas/core/index.py", line 1437, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
File "pandas/hashtable.pyx", line 697, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12349)
File "pandas/hashtable.pyx", line 705, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12300)
KeyError: ('sex', 'age', 'address', 'G1', 'G2')

Process finished with exit code 1

知道如何正确矢量化这两个数据(即训练和测试)吗?并使用 .toarray() 显示两个矩阵

更新

>>>print training_data.info()
/usr/local/Cellar/python/2.7.8_1/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/user/PycharmProjects/PAN-pruebas/escuela_3.py
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 396 entries, (school, sex, age, address, famsize, Pstatus, Medu, Fedu, Mjob, Fjob, reason, guardian, traveltime, studytime, failures, schoolsup, famsup, paid, activities, nursery, higher, internet, romantic, famrel, freetime, goout, Dalc, Walc, health, absences) to (MS, M, 19, U, LE3, T, 1, 1, other, at_home, course, father, 1, 1, 0, no, no, no, no, yes, yes, yes, no, 3, 2, 3, 3, 3, 5, 5)
Data columns (total 3 columns):
id 396 non-null object
content 396 non-null object
label 396 non-null object
dtypes: object(3)
memory usage: 22.7+ KB
None

Process finished with exit code 0

最佳答案

您需要传递一个列表:

test_matrix = dict_vect.transform(testing_data[['G1','G2','sex','school','age']])

您所做的就是尝试使用以下键对 df 建立索引:

['G1','G2','sex','school','age']

这就是为什么你会得到一个KeyError,因为没有像上面那样命名的单列,要选择多个列,你需要传递列名列表和双下标[[ col_list]]

示例:

In [43]:

df = pd.DataFrame(columns=['a','b'])
df
Out[43]:
Empty DataFrame
Columns: [a, b]
Index: []
In [44]:

df['a','b']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-44-33332c7e7227> in <module>()
----> 1 df['a','b']

......
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12349)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12300)()

KeyError: ('a', 'b')

但这有效:

In [45]:

df[['a','b']]
Out[45]:
Empty DataFrame
Columns: [a, b]
Index: []

关于python - 使用 scikit learn DictVectorizer 对特定列进行矢量化时出现问题?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29975033/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com