python - pandas 应用 np.histogram 来 reshape 数据框-6ren

python - pandas 应用 np.histogram 来 reshape 数据框

转载作者：行者123 更新时间：2023-11-30 23:22:21

24

4

我想获取 pandas 数据帧每列的标准化直方图。 np.histogram 是我想使用的，但它返回一个元组，而我只想要第一项。但 Pandas 似乎不喜欢这样。例如，这有效:

import numpy as np

df = pd.DataFrame(np.random.uniform(size=20).reshape(5, 4))

bins = (0, 0.5, 1)
df.apply(np.histogram, bins=bins, normed=True)

并返回

0    ([0.8, 1.2], [0.0, 0.5, 1.0])
1    ([0.8, 1.2], [0.0, 0.5, 1.0])
2    ([0.8, 1.2], [0.0, 0.5, 1.0])
3    ([0.8, 1.2], [0.0, 0.5, 1.0])
dtype: object

但我只想要元组的第一项，所以我尝试了这个:

df.apply(lambda x : np.histogram(x, bins=bins, normed=True)[0])

但它出错了:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-51-3191795e120c> in <module>()
----> 1 df.apply(lambda x : np.histogram(x, bins=bins, normed=True)[0])

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   3310                     if reduce is None:
   3311                         reduce = True
-> 3312                     return self._apply_standard(f, axis, reduce=reduce)
   3313             else:
   3314                 return self._apply_broadcast(f, axis)

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
   3415                 index = None
   3416 
-> 3417             result = self._constructor(data=results, index=index)
   3418             result.columns = res_index
   3419 

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    199                                  dtype=dtype, copy=copy)
    200         elif isinstance(data, dict):
--> 201             mgr = self._init_dict(data, index, columns, dtype=dtype)
    202         elif isinstance(data, ma.MaskedArray):
    203             import numpy.ma.mrecords as mrecords

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _init_dict(self, data, index, columns, dtype)
    321 
    322         return _arrays_to_mgr(arrays, data_names, index, columns,
--> 323                               dtype=dtype)
    324 
    325     def _init_ndarray(self, values, index, columns, dtype=None,

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   4471     axes = [_ensure_index(columns), _ensure_index(index)]
   4472 
-> 4473     return create_block_manager_from_arrays(arrays, arr_names, axes)
   4474 
   4475 

/usr/local/lib/python2.7/site-packages/pandas/core/internals.pyc in create_block_manager_from_arrays(arrays, names, axes)
   3757         return mgr
   3758     except (ValueError) as e:
-> 3759         construction_error(len(arrays), arrays[0].shape[1:], axes, e)
   3760 
   3761 

/usr/local/lib/python2.7/site-packages/pandas/core/internals.pyc in construction_error(tot_items, block_shape, axes, e)
   3729         raise e
   3730     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 3731         passed,implied))
   3732 
   3733 def create_block_manager_from_blocks(blocks, axes):

ValueError: Shape of passed values is (4,), indices imply (4, 5)

> /usr/local/lib/python2.7/site-packages/pandas/core/internals.py(3731)construction_error()
   3730     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 3731         passed,implied))
   3732

有什么想法吗？

最佳答案

如果你愿意，你可以这样做。

In [26]: df.apply(lambda x : Series(np.histogram(x, bins=bins, normed=True)[0]))
Out[26]: 
     0    1    2    3
0  0.4  1.6  0.8  1.6
1  1.6  0.4  1.2  0.4

np.histogram 既不是reducer(返回单个值)，也不是transformer(返回与输入相同的数字) 。所以 apply 不知道如何映射返回值。

这是另一种方式(以及概念上如何思考应用)

In [28]: f = lambda x : Series(np.histogram(x, bins=bins, normed=True)[0])

In [31]: concat([ f(col) for c, col in df.iteritems() ],axis=1)
Out[31]: 
     0    1    2    3
0  0.4  1.6  0.8  1.6
1  1.6  0.4  1.2  0.4

关于python - pandas 应用 np.histogram 来 reshape 数据框，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24542572/

24

4

0

文章推荐： c# - 设置特定列中每个 DataTable 行的值

文章推荐： python + sqlalchemy : calling stored procedures in a loop

文章推荐： c# - 如何检查 AttachedFlyout 是否打开

文章推荐： python - 在 sqlalchemy 中按年、月、日分组

javascript - 如何理解 `constant(_), histogram`( `histogram.value`源代码的一部分)中的 `d3.histogram`？
我正在通过阅读 doc 和 src 来学习 d3-array。 histogram.value's doc很详细，但我还是觉得很难掌握。 The source code有助于更好地理解文档，但我仍
c++ - 未定义对`Histogram::Histogram(QWidget*) 的引用
我正在尝试使用 qmake 和 make 为 uEye 相机编译相机软件。我的 qmake 命令运行正常，但是当我在命令行中键入 make 时出现以下错误 make[1]: Entering dir
elasticsearch - ES : histogram facet with histogram facet with all_terms=true
Elasticsearch Histogramfacet似乎不支持 all_terms = true(即:即使 count=0 也返回 facetvalue/bucket) 这是正确的吗？最佳答案
python - Numpy 和 Pandas : Return histogram values from pandas histogram plot?
我知道我可以用 pandas 绘制直方图: df4 = pd.DataFrame({'a': np.random.randn(1000) + 1}) df4['a'].hist() 但是我怎样才能从这
histogram - 比较画面中同一图表中的两个直方图
我有一个整个数据集的直方图，我想将它与该数据的过滤子集的直方图进行比较。我可以在两个单独的工作表中执行此操作，然后在仪表板中并排显示它们。有没有办法将这两个直方图组合成一个具有公共(public)轴
histogram - 使用wireshark或其他工具绘制RTT直方图
我有一个小型办公网络，我遇到了巨大的互联网链接延迟。我们有一个简单的网络拓扑结构:一台计算机配置为运行 ubuntu 服务器 10.10 的路由器，2 个网卡(一个连接互联网，另一个连接办公网络)和一
histogram - Tensorboard上直方图的含义
我正在研究 Google Tensorboard，我对直方图的含义感到困惑。我阅读了教程，但我似乎不清楚。如果有人能帮助我弄清楚 Tensorboard 直方图每个轴的含义，我真的很感激。来自 Te
histogram - 基于Prometheus中的速率了解histogram_quantile
根据Prometheus文档，为了使用直方图度量具有95％的百分位数，我可以使用以下查询: histogram_quantile(0.95, sum(rate(http_request_duratio
histogram - 如何制作CUDA直方图内核？
我正在为图片上的直方图编写一个 CUDA 内核，但我不知道如何从内核返回一个数组，并且当其他线程读取它时数组会发生变化。有什么可能的解决方案吗？ __global__ void Hist( T
histogram - 如何设置elasticsearch直方图中的间隔数
使用 elasticsearch，我想在我的模型中获取价格字段的直方图分面。在事先不知道最低和最高价格的情况下，我想要的是让直方图覆盖整个价格范围，并具有一定数量的间隔，比如 10。我可以从文档中看到
histogram - NetLogo 直方图显示为一条线而不是条
我在 NetLogo 5.0.5 的模型中创建了一个变量的直方图，但直方图一直显示一条线而不是我想要的条。我在锅设置区域使用 set-histogram-num-bars n，但没有任何变化。在模型库
histogram - CIAreaHistogram 输入比例因子
我正在构建一个使用 CIAreaHistogram 的应用程序核心图像过滤器。我使用 inputCount用于测试的值(桶数)为 10，以及 inputScale值为 1。我得到 CIImage对于
r : ecdf over histogram
在 R 中，使用 ecdf我可以绘制经验累积分布函数 plot(ecdf(mydata)) 并与 hist我可以绘制数据的直方图 hist(mydata) 如何在同一个图中绘制直方图和 ecdf？编
histogram - 在grafana中显示每小时平均值(直方图)
给定每小时都有数据点的(电力)市场数据的时间序列，我想显示一个包含每小时数据的所有时间/时间范围平均值的条形图，以便分析师可以轻松地将实际价格与所有时间平均值进行比较(一天中哪个小时最贵/最便宜)。
javascript - histogram 直方图的结果
我已经通过以下命令生成了直方图: db.mydb.aggregate([{ $bucketAuto: { groupBy: "$userId", buckets: 1e9 } }]) 假设我的唯一身份
histogram - influxdb flux 中的非累积直方图函数
我是 influxdb 中的 flux 新手，正在尝试编写用于生成直方图的 flux 查询。文档中给出的函数 https://docs.influxdata.com/flux/v0.24/functi
histogram - 何时使用 Datadog 分布和直方图
对于在多实例上运行的应用程序，我找不到任何文章描述使用 datadog 直方图与 datadog 分布相比的优势。有人可以帮助我在这两者之间做出最佳选择吗？最佳答案我在 DataDog 文档中发现
histogram - PCL 点特征直方图 - 分箱
分箱过程是点特征直方图估计的一部分，结果是 b^3如果仅使用三个角度特征(alpha、phi、theta)，则为 bins，其中 b 是 bins 的数量。为什么是这样b^3而不是 b * 3 ?
Python:numpy.histogram 绘图
我想测量 16 位图像中的像素强度。因此，我制作了一个 numpy 直方图，显示像素数与灰度值从 0 到 65535(16 位)的关系。我用的是 hist= numpy.histogram(grays
python :使用 numpy.histogram
我正在使用这个: http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html 我有一个列表a，我想这样使用: nu

首页

博学

6Ren·AI

商城

python - pandas 应用 np.histogram 来 reshape 数据框