gpt4 book ai didi

python - Pandas :DataFrame.unstack 错误

转载 作者:太空宇宙 更新时间:2023-11-04 05:23:28 29 4
gpt4 key购买 nike

我编写了以下函数将数据框的几列转换为数值:

def factorizeMany(data, columns):
""" Factorize a bunch of columns in a data frame"""
data[columns] = data[columns].stack().rank(method='dense').unstack()
return data

这样调用

trainDataPre = factorizeMany(trainDataMerged.fillna(0), columns=["char_{0}".format(i) for i in range(1,10)])

给我一​​个错误。我不知道在哪里寻找原因,可能是输入错误?

---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-14-357f8a4b2ef8> in <module>()
1 #trainDataPre = trainDataMerged.drop(["people_id", "activity_id", "date"], axis=1)
2 #trainDataPre = trainDataMerged.fillna(0)
----> 3 trainDataPre = mininggear.factorizeMany(trainDataMerged.fillna(0), columns=["char_{0}".format(i) for i in range(1,10)])

/Users/cls/Dropbox/Datengräber/Kaggle/RedHat/mininggear.py in factorizeMany(data, columns)
15 def factorizeMany(data, columns):
16 """ Factorize a bunch of columns in a data frame"""
---> 17 data[columns] = data[columns].stack().rank(method='dense').unstack()
18 return data
19

/usr/local/lib/python3.5/site-packages/pandas/core/series.py in unstack(self, level, fill_value)
2041 """
2042 from pandas.core.reshape import unstack
-> 2043 return unstack(self, level, fill_value)
2044
2045 # ----------------------------------------------------------------------

/usr/local/lib/python3.5/site-packages/pandas/core/reshape.py in unstack(obj, level, fill_value)
405 else:
406 unstacker = _Unstacker(obj.values, obj.index, level=level,
--> 407 fill_value=fill_value)
408 return unstacker.get_result()
409

/usr/local/lib/python3.5/site-packages/pandas/core/reshape.py in __init__(self, values, index, level, value_columns, fill_value)
90
91 # when index includes `nan`, need to lift levels/strides by 1
---> 92 self.lift = 1 if -1 in self.index.labels[self.level] else 0
93
94 self.new_index_levels = list(index.levels)

AttributeError: 'Index' object has no attribute 'labels'

最佳答案

该错误是由于您试图通过填充 NaN 对包含数值和分类/字符串值的数据帧子集执行 rank 操作> 在带有 0 的数据框中调用该函数。

考虑这种情况:

df = pd.DataFrame({'char_1': ['cat', 'dog', 'buffalo', 'cat'],
'char_2': ['mouse', 'tiger', 'lion', 'mouse'],
'char_3': ['giraffe', np.NaN, 'cat', np.NaN]})
df

Image

df = df.fillna(0)
df[['char_3']].stack().rank()
Series([], dtype: float64)

所以,您基本上是在一个空系列上执行 unstack 操作,毕竟这不是您想要做的。

更好的做法是避免进一步的并发症:

def factorizeMany(data, columns):
""" Factorize a bunch of columns in a data frame"""
stacked = data[columns].stack(dropna=False)
data[columns] = pandas.Series(stacked.factorize()[0], index=stacked.index).unstack()
return data

关于python - Pandas :DataFrame.unstack 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39546975/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com