gpt4 book ai didi

python - pandas 应用 np.histogram 来 reshape 数据框

转载 作者:行者123 更新时间:2023-11-30 23:22:21 24 4
gpt4 key购买 nike

我想获取 pandas 数据帧每列的标准化直方图。 np.histogram 是我想使用的,但它返回一个元组,而我只想要第一项。但 Pandas 似乎不喜欢这样。例如,这有效:

import numpy as np

df = pd.DataFrame(np.random.uniform(size=20).reshape(5, 4))

bins = (0, 0.5, 1)
df.apply(np.histogram, bins=bins, normed=True)

并返回

0    ([0.8, 1.2], [0.0, 0.5, 1.0])
1 ([0.8, 1.2], [0.0, 0.5, 1.0])
2 ([0.8, 1.2], [0.0, 0.5, 1.0])
3 ([0.8, 1.2], [0.0, 0.5, 1.0])
dtype: object

但我只想要元组的第一项,所以我尝试了这个:

df.apply(lambda x : np.histogram(x, bins=bins, normed=True)[0]) 

但它出错了:

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-51-3191795e120c> in <module>()
----> 1 df.apply(lambda x : np.histogram(x, bins=bins, normed=True)[0])

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
3310 if reduce is None:
3311 reduce = True
-> 3312 return self._apply_standard(f, axis, reduce=reduce)
3313 else:
3314 return self._apply_broadcast(f, axis)

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
3415 index = None
3416
-> 3417 result = self._constructor(data=results, index=index)
3418 result.columns = res_index
3419

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
199 dtype=dtype, copy=copy)
200 elif isinstance(data, dict):
--> 201 mgr = self._init_dict(data, index, columns, dtype=dtype)
202 elif isinstance(data, ma.MaskedArray):
203 import numpy.ma.mrecords as mrecords

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _init_dict(self, data, index, columns, dtype)
321
322 return _arrays_to_mgr(arrays, data_names, index, columns,
--> 323 dtype=dtype)
324
325 def _init_ndarray(self, values, index, columns, dtype=None,

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
4471 axes = [_ensure_index(columns), _ensure_index(index)]
4472
-> 4473 return create_block_manager_from_arrays(arrays, arr_names, axes)
4474
4475

/usr/local/lib/python2.7/site-packages/pandas/core/internals.pyc in create_block_manager_from_arrays(arrays, names, axes)
3757 return mgr
3758 except (ValueError) as e:
-> 3759 construction_error(len(arrays), arrays[0].shape[1:], axes, e)
3760
3761

/usr/local/lib/python2.7/site-packages/pandas/core/internals.pyc in construction_error(tot_items, block_shape, axes, e)
3729 raise e
3730 raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 3731 passed,implied))
3732
3733 def create_block_manager_from_blocks(blocks, axes):

ValueError: Shape of passed values is (4,), indices imply (4, 5)

> /usr/local/lib/python2.7/site-packages/pandas/core/internals.py(3731)construction_error()
3730 raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 3731 passed,implied))
3732

有什么想法吗?

最佳答案

如果你愿意,你可以这样做。

In [26]: df.apply(lambda x : Series(np.histogram(x, bins=bins, normed=True)[0]))
Out[26]:
0 1 2 3
0 0.4 1.6 0.8 1.6
1 1.6 0.4 1.2 0.4

np.histogram 既不是reducer(返回单个值),也不是transformer(返回与输入相同的数字) 。所以 apply 不知道如何映射返回值。

这是另一种方式(以及概念上如何思考应用)

In [28]: f = lambda x : Series(np.histogram(x, bins=bins, normed=True)[0])

In [31]: concat([ f(col) for c, col in df.iteritems() ],axis=1)
Out[31]:
0 1 2 3
0 0.4 1.6 0.8 1.6
1 1.6 0.4 1.2 0.4

关于python - pandas 应用 np.histogram 来 reshape 数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24542572/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com