gpt4 book ai didi

python - 使用双轴 MultiIndexed 将 Pandas DataFrame 存储到 HDF5

转载 作者:行者123 更新时间:2023-11-28 18:31:56 26 4
gpt4 key购买 nike

我认为存储具有两个 MultiIndexed 轴的 DataFrame 应该是可能的。但是,我收到以下错误:

In [1]: index = pd.MultiIndex.from_product([['Foo', 'Bar'],['One','Two','Three']])
column = pd.MultiIndex.from_product([['foo', 'bar'],['one','two','three']])
df = pd.DataFrame(np.random.rand(6,6), index=index, columns=column)
df
Out[1]: foo bar
one two three one two three
Foo One 0.605352 0.882382 0.472946 0.615619 0.108022 0.389674
Two 0.746384 0.594509 0.556881 0.457000 0.529793 0.929574
Three 0.270978 0.956778 0.515201 0.626850 0.852708 0.861962
Bar One 0.219994 0.648191 0.677824 0.408439 0.079326 0.414059
Two 0.186167 0.767103 0.880667 0.205253 0.647471 0.449379
Three 0.353171 0.249900 0.723791 0.458349 0.977604 0.691188

In [2]: with pd.HDFStore('test.h5', 'w') as store:
store.append('output', df)
Out[2]: ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-55-87e96c141a7f> in <module>()
1 with pd.HDFStore('test.h5', 'w') as store:
----> 2 store.append('output', df)

/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py in append(self, key, value, format, append, columns, dropna, **kwargs)
917 kwargs = self._validate_format(format, kwargs)
918 self._write_to_group(key, value, append=append, dropna=dropna,
--> 919 **kwargs)
920
921 def append_to_multiple(self, d, value, selector, data_columns=None,

/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
1262
1263 # write the object
-> 1264 s.write(obj=value, append=append, complib=complib, **kwargs)
1265
1266 if s.is_table and index:

/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py in write(self, obj, data_columns, **kwargs)
4195 data_columns.insert(0, n)
4196 return super(AppendableMultiFrameTable, self).write(
-> 4197 obj=obj, data_columns=data_columns, **kwargs)
4198
4199 def read(self, **kwargs):

/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
3785 self.create_axes(axes=axes, obj=obj, validate=append,
3786 min_itemsize=min_itemsize,
-> 3787 **kwargs)
3788
3789 for a in self.axes:

/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
3383 axis, axis_labels = self.non_index_axes[0]
3384 data_columns = self.validate_data_columns(
-> 3385 data_columns, min_itemsize)
3386 if len(data_columns):
3387 mgr = block_obj.reindex_axis(

/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py in validate_data_columns(self, data_columns, min_itemsize)
3246 if info.get('type') == 'MultiIndex' and data_columns:
3247 raise ValueError("cannot use a multi-index on axis [{0}] with "
-> 3248 "data_columns {1}".format(axis, data_columns))
3249
3250 # evaluate the passed data_columns, True == use all columns

ValueError: cannot use a multi-index on axis [1] with data_columns ['level_1', 'level_0']

像这样存储数据对我来说最有意义。主要是因为我的需求会有很大的不同。对于某些应用程序,我将需要所有行和所有列。对于许多其他人,我只需要所有行和一个父列:假设我需要 foo 下的所有行。我可能还只需要一个父行和一个父列:Foo, foo

我当然需要所有辅助行和列。

在我的例子中,主要行索引是状态,次要行索引是传感器名称,主要列索引是感测到的不同事物,次要列索引是传感器输出的统计数据。因此,很容易看出,我可能只需要一种类型的所有状态或一种状态的感知数据,或者我可能需要从一种状态或所有状态感知的所有事物。

我正在寻找错误的修复方法或更好的数据存储方式。

最佳答案

如果在存储 df 时强制执行 format='fixed',您可以保留 MultiIndex:

with pd.HDFStore('test.h5', 'w') as store:
store.put('output', df, format='fixed')
print store['output']

foo bar
one two three one two three
Foo One 0.9626 0.9761 0.4385 0.2976 0.0882 0.7589
Two 0.7842 0.7563 0.4796 0.5664 0.1511 0.9345
Three 0.3364 0.4271 0.4107 0.9009 0.5207 0.4082
Bar One 0.9892 0.4595 0.1485 0.1456 0.9935 0.1386
Two 0.3187 0.7908 0.2947 0.7354 0.5759 0.9102
Three 0.0499 0.1865 0.8113 0.4815 0.1427 0.3322

但是您将失去一些功能(例如,使用 .append() 方法)。根据您的需要,这可能是问题,也可能不是问题。

关于python - 使用双轴 MultiIndexed 将 Pandas DataFrame 存储到 HDF5,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36257665/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com