gpt4 book ai didi

python - 如何正确读取groupby结果生成的csv文件?

转载 作者:行者123 更新时间:2023-12-05 03:32:20 26 4
gpt4 key购买 nike

我计算了两组 DataFrame 的平均值并将结果保存到 CSV 文件中。

然后,我尝试通过read_csv()再次读取它,但是.loc()函数对加载的DataFrame不起作用。

这是代码示例:

import pandas as pd
import numpy as np

np.random.seed(100)
df = pd.DataFrame(np.random.randn(100, 3), columns=['a', 'b', 'value'])

a_bins = np.arange(-3, 4, 1)
b_bins = np.arange(-2, 4, 2)

# calculate the mean value
df['a_bins'] = pd.cut(df['a'], bins=a_bins)
df['b_bins'] = pd.cut(df['b'], bins=b_bins)
df_value_bin = df.groupby(['a_bins','b_bins']).agg({'value':'mean'})

# save to csv file
df_value_bin.to_csv('test.csv')

# read the exported file
df_test = pd.read_csv('test.csv')

当我输入时:

df_value_bin.loc[(1.5, -1)]

我得到了这个输出

value    0.254337
Name: ((1, 2], (-2, 0]), dtype: float64

但是,如果我使用相同的方法从加载的 CSV 文件中定位值:

df_test.loc[(1.5, -1)]

我遇到了这个键盘错误:

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/tmp/ipykernel_33836/4042082162.py in <module>
----> 1 df_test.loc[(1.5, -1)]

~/miniconda3/lib/python3.9/site-packages/pandas/core/indexing.py in __getitem__(self, key)
923 with suppress(KeyError, IndexError):
924 return self.obj._get_value(*key, takeable=self._takeable)
--> 925 return self._getitem_tuple(key)
926 else:
927 # we by definition only have the 0th axis

~/miniconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
1098 def _getitem_tuple(self, tup: tuple):
1099 with suppress(IndexingError):
-> 1100 return self._getitem_lowerdim(tup)
1101
1102 # no multi-index, so validate all of the indexers

~/miniconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
836 # We don't need to check for tuples here because those are
837 # caught by the _is_nested_tuple_indexer check above.
--> 838 section = self._getitem_axis(key, axis=i)
839
840 # We should never have a scalar section here, because

~/miniconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1162 # fall thru to straight lookup
1163 self._validate_key(key, axis)
-> 1164 return self._get_label(key, axis=axis)
1165
1166 def _get_slice_axis(self, slice_obj: slice, axis: int):

~/miniconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
1111 def _get_label(self, label, axis: int):
1112 # GH#5667 this will fail if the label is not present in the axis.
-> 1113 return self.obj.xs(label, axis=axis)
1114
1115 def _handle_lowerdim_multi_index_axis0(self, tup: tuple):

~/miniconda3/lib/python3.9/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
3774 raise TypeError(f"Expected label or tuple of labels, got {key}") from e
3775 else:
-> 3776 loc = index.get_loc(key)
3777
3778 if isinstance(loc, np.ndarray):

~/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
386 except ValueError as err:
387 raise KeyError(key) from err
--> 388 raise KeyError(key)
389 return super().get_loc(key, method=method, tolerance=tolerance)
390

KeyError: 1.5

最佳答案

您应该将索引读取为 MultiIndex,但您需要将字符串转换为间隔。您可以使用 to_interval (所有学分 korakot ):

def to_interval(istr):
c_left = istr[0]=='['
c_right = istr[-1]==']'
closed = {(True, False): 'left',
(False, True): 'right',
(True, True): 'both',
(False, False): 'neither'
}[c_left, c_right]
left, right = map(int, istr[1:-1].split(','))
return pd.Interval(left, right, closed)

df_test = pd.read_csv('test.csv', index_col=[0,1], converters={0: to_interval,1: to_interval})

测试:

df_test.loc[(1.5, -1)]
#value 0.254337
#Name: ((1, 2], (-2, 0]), dtype: float64

关于python - 如何正确读取groupby结果生成的csv文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70497633/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com