gpt4 book ai didi

python - 属性错误: 'CategoricalBlock' object has no attribute 'sp_index'

转载 作者:行者123 更新时间:2023-12-01 02:55:28 24 4
gpt4 key购买 nike

这个错误很奇怪,我什至在谷歌上找不到任何关于它的信息。

我正在尝试对现有稀疏数据帧中的列进行热编码,

combined_cats 是所有可能类别的集合。 column_name 是通用列名称。

df[column_name] = df[column_name].astype('category', categories=combined_cats,copy=False)

但是,此操作失败并出现标题中的错误。我认为你不能对稀疏矩阵进行热编码,但我似乎无法通过 to_dense() 将其转换回密集矩阵,因为它说 numpy ndarray 没有这样的方法。

我尝试使用 as_matrix() 并重置列:

df[column_name] = df[column_name].as_matrix()
df[column_name] = df[column_name].astype('category', categories=combined_cats,copy=False)

这也不起作用。我做错了什么吗?当我尝试使用combined_cats时发生错误。

例如:

def hot_encode_column_in_both_datasets(column_name,df,df2,sparse=True):
col1b = set(df2[column_name].unique())
col1a = set(df[column_name].unique())
combined_cats = list(col1a.union(col1b))
df[column_name] = df[column_name].astype('category', categories=combined_cats,copy=False)
df2[column_name] = df2[column_name].astype('category', categories=combined_cats,copy=False)

df = pd.get_dummies(df, columns=[column_name],sparse=sparse)
df2 = pd.get_dummies(df2, columns=[column_name],sparse=sparse)
try:
del df[column_name]
del df2[column_name]
except:
pass
return df,df2



df = pd.DataFrame({"col1":['a','b','c','d'],"col2":["potato","tomato","potato","tomato"],"col3":[1,1,1,1]})
df2 = pd.DataFrame({"col1":['g','b','q','r'],"col2":["potato","flowers","potato","flowers"],"col3":[1,1,1,1]})

## Hot encode col1
df,df2 = hot_encode_column_in_both_datasets("col1",df,df2)

len(df.columns) #9
len(df2.columns) #9

## Hot encode col2 as well
df,df2 = hot_encode_column_in_both_datasets("col2",df,df2)

Traceback (most recent call last):

File "<ipython-input-44-d8e27874a25b>", line 1, in <module>
df,df2 = hot_encode_column_in_both_datasets("col2",df,df2)

File "<ipython-input-34-5ae1e71bbbd5>", line 331, in hot_encode_column_in_both_datasets
df[column_name] = df[column_name].astype('category', categories=combined_cats,copy=False)

File "/storage/programfiles/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 2419, in __setitem__
self._set_item(key, value)

File "/storage/programfiles/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 2485, in _set_item
value = self._sanitize_column(key, value)

File "/storage/programfiles/anaconda3/lib/python3.5/site-packages/pandas/sparse/frame.py", line 324, in _sanitize_column
clean = value.reindex(self.index).as_sparse_array(

File "/storage/programfiles/anaconda3/lib/python3.5/site-packages/pandas/sparse/series.py", line 573, in reindex
return self.copy()

File "/storage/programfiles/anaconda3/lib/python3.5/site-packages/pandas/sparse/series.py", line 555, in copy
return self._constructor(new_data, sparse_index=self.sp_index,

File "/storage/programfiles/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 2744, in __getattr__
return object.__getattribute__(self, name)

File "/storage/programfiles/anaconda3/lib/python3.5/site-packages/pandas/sparse/series.py", line 242, in sp_index
return self.block.sp_index

AttributeError: 'CategoricalBlock' object has no attribute 'sp_index'

最佳答案

As i said before我会使用CountVectorizer本例中的方法。

演示:

from sklearn.feature_extraction.text import CountVectorizer

cv = CountVectorizer(vocabulary=np.union1d(df.col2, df2.col2))

r1 = pd.SparseDataFrame(cv.fit_transform(df.col2),
columns=cv.get_feature_names(),
index=df.index, default_fill_value=0)

r2 = pd.SparseDataFrame(cv.fit_transform(df2.col2),
columns=cv.get_feature_names(),
index=df2.index, default_fill_value=0)

注意:pd.SparseDataFrame(sparse_array) 构造函数是 Pandas 0.20.0 的新功能,因此我们需要 Pandas 0.20.0+ 来实现此解决方案

结果:

In [15]: r1
Out[15]:
flowers potato tomato
0 0.0 1 0
1 0.0 0 1
2 0.0 1 0
3 0.0 0 1

In [16]: r2
Out[16]:
flowers potato tomato
0 0 1 0.0
1 1 0 0.0
2 0 1 0.0
3 1 0 0.0

注意内存使用情况:

In [17]: r1.memory_usage()
Out[17]:
Index 80
flowers 0 # 0 * 8 bytes
potato 16 # 2 * 8 bytes (int64)
tomato 16 # ...
dtype: int64

In [18]: r2.memory_usage()
Out[18]:
Index 80
flowers 16
potato 16
tomato 0
dtype: int64

关于python - 属性错误: 'CategoricalBlock' object has no attribute 'sp_index' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44246842/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com