python pandas数据框groupby或pivot

python pandas数据框groupby或pivot_table

转载作者：行者123 更新时间：2023-11-30 21:57:28

25

4

示例:

import pandas as pd
data = {'id':[101,101,101,101,102,102,102,102],
    'day':[1,2,1,2,1,2,1,2],
    'year':[2011,2011,2012,2012,2011,2011,2012,2012],
    'avg':[0.500,0.400,0.300,0.200,0.555,0.455,0.355,0.255],
    'sum':[1, 2, 2, 3, 6, 6, 8, 9],
    'div':[2, 1, 3, 2, 6, 1, 6, 3]}
df = pd.DataFrame(data)
df

    id  day year    avg     sum div
0   101 1   2011    0.500   1   2
1   101 2   2011    0.400   2   1
2   101 1   2012    0.300   2   3
3   101 2   2012    0.200   3   2
4   102 1   2011    0.555   6   6
5   102 2   2011    0.455   6   1
6   102 1   2012    0.355   8   6
7   102 2   2012    0.255   9   3

期望的输出:

    id  sum div 2011_avg    2012_avg    2011_sum    2012_sum    2011_div    2012_div
0   101 8   8   0.450       0.250       3           5           2           1.5
1   102 29  16  0.505       0.305       12          17          6           2.0

我按年份为每列制作了多个数据透视表并多次连接..

任何人都可以告诉我一些知识，以更简单或有效的方式获得所需的输出吗？

最佳答案

您可能需要groupby两次，然后join结果返回

s=df.groupby(['id','year']).agg({'avg':'mean','sum':'sum','div':lambda x : x.iloc[0]/x.iloc[1]})
s=s.unstack()# here is reshape 
s.columns=s.columns.map('{0[1]}_{0[0]}'.format) # here is flatten the multiple index 
s
Out[723]:
     2011_avg  2012_avg  2011_sum  2012_sum  2011_div  2012_div
id
101     0.450     0.250         3         5       2.0       1.5
102     0.505     0.305        12        17       6.0       2.0

s2=df.groupby(['id']).agg({'sum':'sum','div':lambda x : x.iloc[0]/x.iloc[1]})

Finaldf=s2.join(s)# join back 

Finaldf
Out[729]: 
     sum  div  2011_avg    ...     2012_sum  2011_div  2012_div
id                         ...                                 
101    8    2     0.450    ...            5       2.0       1.5
102   29    6     0.505    ...           17       6.0       2.0
[2 rows x 8 columns]

关于python pandas数据框groupby或pivot_table，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55243285/

25

4

0

文章推荐： python - python读取特殊格式的文本文件

文章推荐： python - Pandas、groupby 和计数其他列中的数据

文章推荐： python - Pandas:如何动态填充NaN？

python - Pandas :pivot 和 pivot_table 之间的区别。为什么只有 pivot_table 工作？
我有以下数据框。 df.head(30) struct_id resNum score_type_name score_value 0 4294967297 1
Python pivot_table - 添加差异列
我是python的新手。我有以下数据框。我能够在 Excel 中旋转。我想添加差异列(在图像中，我手动添加了它)。区别在于B-A值。我能够使用 Python 数据透视表复制差异列和总计。下面是我的
dataframe - 如何使用Dask Pivot_table？
我正在尝试在 Dask 上使用 Pivot_table 和以下数据框: date store_nbr item_nbr unit_sales year month 0
python - Pivot_table 来自列值中的列表
我有一个像这样的数据框: ID Sim Items 1 0.345 [7,7] 2
python - pivot_table 没有要聚合的数字类型
我想根据以下数据框制作一个数据透视表，其中包含列 sales、rep。数据透视表显示 sales 但没有 rep。当我尝试仅使用 rep 时，出现错误 DataError: No numeric ty
pandas pivot_table() 按日期分多列数据的方法
如下所示： date 20170307 20170308 iphone4 2 0
python - pandas pivot_table 的边距仅垂直聚合
考虑一个数据框: df = pd.DataFrame( {'last_year': [1, 2, 3], 'next_year': [4, 5, 6]}, index=['foo',
python - 操作 pivot_table 中的多索引列
我看到这个问题被问过多次，但其他问题的解决方案没有奏效! 我有这样的数据框 df = pd.DataFrame({ "date": ["20180920"] * 3 + ["20180921"] *
python - Pandas pivot_table 的更快替代品
我正在使用 Pandas pivot_table在大型数据集(1000 万行，6 列)上运行。由于执行时间是最重要的，我尝试加快进程。目前处理整个数据集需要大约 8 秒，这很慢，我希望找到提高速度/性
python - Pandas pivot_table 因列和边距而失败
我收到了 KeyError: "... not in index"使用pandas的pivot_table时。这是示例代码: arrays = [['bar', 'bar', 'foo', 'foo
python - 带有pd.grouper和Margins的Pivos Pivot_table
当将列设置为Margins=True时，pd.grouper datetime在 Pandas 数据透视表中将不起作用。这是我的代码，可以按预期工作- p = df.pivot_table(value
python - Pandas pivot_table 保留顺序
>>> df A B C D 0 foo one small 1 1 foo one large 2 2 foo one large 2 3 foo two sm
python - Pandas pivot_table 包含空身份
数据集 x y a 1 3 0 1 1 0 1 2 0 3 6 0 5 3 1 1 5 0 1 7 0 1 6 0 1 4
python - Pandas pivot_table 包含空身份
数据集 x y a 1 3 0 1 1 0 1 2 0 3 6 0 5 3 1 1 5 0 1 7 0 1 6 0 1 4
python - 使用 pivot_table 时应用不同的聚合函数
我有这个样本: import pandas as pd import numpy as np dic = {'name': ['j','c','q','j','c','q','j','c
python - Pandas pivot_table 不符合值顺序
我对 pandas pivot_table 有疑问。有时，“值”列表中指定的列的顺序不匹配 In [11]: p = pivot_table(df, values=["x","y"], cols=[
python - Pandas pivot_table 百分位数
我试图通过平均值、中位数、第 25 个百分位数、第 75 个百分位数、标准差来描述 A 列、B 列。 df = pd.DataFrame({'A':[1,9,3,4,6,8,2,7],
python pandas pivot_table 列一级错误名称
我有下表: ID Metric Level Level(% Change) Level(Diff) Index 0 2016 A 10
python - Pivot_table MultiIndex 到列
我有下表: In [303]: table.head() Out[303]: people weekday weekofyear 2012-01-01 119
python - 将总列的百分比添加到 Pandas pivot_table
我似乎无法弄清楚如何将每个 date_submitted 组的总列百分比添加到下面的 pandas 数据透视表中: In [177]: pass_rate_pivot date_submitted

首页

博学

6Ren·AI

商城

python pandas数据框groupby或pivot_table