python - Pandas groupby - 对用户进行分组并计算订阅类型-6ren

python - Pandas groupby - 对用户进行分组并计算订阅类型

转载作者：行者123 更新时间：2023-12-01 03:39:35

我正在尝试使用 pandas 对成员进行分组，计算成员已购买的订阅类型的数量并获取每个成员的总支出。加载后，数据类似于:

df = 

Member Nbr  Member Name-First   Member Name-Last        Date-Joined             Member Type         Amount  Addr-Formatted  Date-Birth              Gender      Status    
1           Aboud               Tordon                  2010-03-31 00:00:00     1 Year Membership   331.00  ADDRESS_1       1972-08-01 00:00:00     Male        Active  
1           Aboud               Tordon                  2011-04-16 00:00:00     1 Year Membership   334.70  ADDRESS_1       1972-08-01 00:00:00     Male        Active  
1           Aboud               Tordon                  2012-08-06 00:00:00     1 Year Membership   344.34  ADDRESS_1       1972-08-01 00:00:00     Male        Active  
1           Aboud               Tordon                  2013-08-21 00:00:00     1 Year Membership   362.53  ADDRESS_1       1972-08-01 00:00:00     Male        Active  
1           Aboud               Tordon                  2015-08-31 00:00:00     1 Year Membership   289.47  ADDRESS_1       1972-08-01 00:00:00     Male        Active  

2          Jean                 Manuel                  2012-12-10 00:00:00     4 Month Membership  148.79  ADDRESS_2       1984-08-01 00:00:00     Male        In-Active   
2          Jean                 Manuel                  2013-03-13 00:00:00     1 Year Membership   348.46  ADDRESS_2       1984-08-01 00:00:00     Male        In-Active
2          Jean                 Manuel                  2014-03-15 00:00:00     1 Year Membership   316.86  ADDRESS_2       1984-08-01 00:00:00     Male        In-Active   

3          Val                  Adams                   2010-02-09 00:00:00     1 Year Membership   333.25  ADDRESS_3       1934-10-26 00:00:00     Female      Active  
3          Val                  Adams                   2011-03-09 00:00:00     1 Year Membership   333.88  ADDRESS_3       1934-10-26 00:00:00     Female      Active
3          Val                  Adams                   2012-04-03 00:00:00     1 Year Membership   318.34  ADDRESS_3       1934-10-26 00:00:00     Female      Active
3          Val                  Adams                   2013-04-15 00:00:00     1 Year Membership   350.73  ADDRESS_3       1934-10-26 00:00:00     Female      Active  
3          Val                  Adams                   2014-04-19 00:00:00     1 Year Membership   291.63  ADDRESS_3       1934-10-26 00:00:00     Female      Active  
3          Val                  Adams                   2015-04-19 00:00:00     1 Year Membership   247.35  ADDRESS_3       1934-10-26 00:00:00     Female      Active

5          Michele              Younes                  2010-02-14 00:00:00     1 Year Membership   333.25  ADDRESS_4       1933-06-23 00:00:00     Female      In-Active   
5          Michele              Younes                  2011-05-23 00:00:00     1 Year Membership   317.77  ADDRESS_4       1933-06-23 00:00:00     Female      In-Active   
5          Michele              Younes                  2012-05-28 00:00:00     1 Year Membership   328.16  ADDRESS_4       1933-06-23 00:00:00     Female      In-Active   
5          Michele              Younes                  2013-05-31 00:00:00     1 Year Membership   360.02  ADDRESS_4       1933-06-23 00:00:00     Female      In-Active

7          Adam                 Herzburg                2010-07-11 00:00:00     1 Year Membership   335 48  ADDRESS_5       1987-08-30 00:00:00     Male        In-Active
...

由于最受欢迎的成员(member)类型是1个月、3个月、4个月、 6 个月 和 1 年 我想创建一个列来计算给定成员(member)已购买的这些成员(member)类型 的数量。

还有2个月、5个月、7个月、8个月和仅池 成员类型出现的频率非常低，如果成员拥有该类型的契约(Contract)，我想将其算作“杂项”。

我还试图获取一个“总计”列，该列总结了给定成员(member)的支出总额。

本质上我想将我以前的数据框转换为类似:

df1=
Member Nbr  Member Name-First   Member Name-Last    1_Month  3_Month  4_Month  6_Month  1_Year  Misc    Total    Addr-Formatted Date-Birth           Gender     Status
1           Aboud               Tordon              0        0        0        0        5       0       1662.04  ADDRESS_1      1972-08-01 00:00:00  Male       Active
2           Jean                Manuel              0        0        1        0        2       0       813.86   ADDRESS_2      1984-08-01 00:00:00  Male       In-Active
3           Val                 Adams               0        0        0        0        6       0       1875.18  ADDRESS_3      1934-10-26 00:00:00  Female     Active
5           Michele             Younes              0        0        0        0        4       0       1339.20  ADDRESS_4      1933-06-23 00:00:00  Female     In-Active
7           Adam                Herzburg            0        0        0        0        1       0       335.48   ADDRESS_5      1933-06-23 00:00:00  Male       In-Active

...

我遇到的问题是，每当我使用groupby时，我只能对金额进行求和，或者单独获取一种特定类型契约(Contract)的计数，但我无法使其类似于 df1。

最佳答案

你可以先map Member Type 列的值由 dict d 然后 fillna按值其他:

d = {'1 Year Membership':'1_Year','1 Month Membership':'1_Month', '3 Month Membership':'3_Month', '4 Month Membership':'4_Month', '6 Month Membership':'6_Month'}
df['Type'] = df['Member Type'].map(d).fillna('Misc')
#print (df)

然后groupby并聚合sum:

df0 = df.groupby(['Member Nbr','Member Name-First','Member Name-Last','Addr-Formatted','Date-Birth','Gender','Status'])['Amount'].sum()
#print (df0)

将列类型添加到分组列列表并聚合 size ，然后通过 unstack reshape 形状:

df1 = df.groupby(['Member Nbr','Member Name-First','Member Name-Last','Addr-Formatted','Date-Birth','Gender','Status', 'Type']).size().unstack(fill_value=0)
#print (df1)

最后concat两个数据帧:

print (pd.concat([df0, df1], axis=1).reset_index())
   Member Nbr Member Name-First Member Name-Last Addr-Formatted  \
0           1             Aboud           Tordon      ADDRESS_1   
1           2              Jean           Manuel      ADDRESS_2   
2           3               Val            Adams      ADDRESS_3   
3           5           Michele           Younes      ADDRESS_4   
4           7              Adam         Herzburg      ADDRESS_5   

            Date-Birth  Gender     Status   Amount  1_Year  4_Month  
0  1972-08-01 00:00:00    Male     Active  1662.04       5        0  
1  1984-08-01 00:00:00    Male  In-Active   814.11       2        1  
2  1934-10-26 00:00:00  Female     Active  1875.18       6        0  
3  1933-06-23 00:00:00  Female  In-Active  1339.20       4        0  
4  1987-08-30 00:00:00    Male  In-Active   335.48       1        0

编辑:

如果成员类型列中缺少某些值，则需要添加reindex :

df1 = df.groupby(['Member Nbr','Member Name-First','Member Name-Last','Addr-Formatted','Date-Birth','Gender','Status', 'Type']).size().unstack(fill_value=0).reindex(columns=d.values(), fill_value=0)
#print (df1)

print (pd.concat([df0, df1], axis=1).reset_index())
   Member Nbr Member Name-First Member Name-Last Addr-Formatted  \
0           1             Aboud           Tordon      ADDRESS_1   
1           2              Jean           Manuel      ADDRESS_2   
2           3               Val            Adams      ADDRESS_3   
3           5           Michele           Younes      ADDRESS_4   
4           7              Adam         Herzburg      ADDRESS_5   

            Date-Birth  Gender     Status   Amount  6_Month  3_Month  4_Month  \
0  1972-08-01 00:00:00    Male     Active  1662.04        0        0        0   
1  1984-08-01 00:00:00    Male  In-Active   814.11        0        0        1   
2  1934-10-26 00:00:00  Female     Active  1875.18        0        0        0   
3  1933-06-23 00:00:00  Female  In-Active  1339.20        0        0        0   
4  1987-08-30 00:00:00    Male  In-Active   335.48        0        0        0   

   1_Year  1_Month  
0       5        0  
1       2        0  
2       6        0  
3       4        0  
4       1        0

<小时/>

可以使用第二个groupby(最快的)pivot_table :

df2 = df.pivot_table(index=['Member Nbr','Member Name-First','Member Name-Last','Addr-Formatted','Date-Birth','Gender','Status'], columns='Type', values='Amount', aggfunc=len, fill_value=0).reindex(columns=d.values(), fill_value=0)
print (pd.concat([df0, df2], axis=1).reset_index())
   Member Nbr Member Name-First Member Name-Last Addr-Formatted  \
0           1             Aboud           Tordon      ADDRESS_1   
1           2              Jean           Manuel      ADDRESS_2   
2           3               Val            Adams      ADDRESS_3   
3           5           Michele           Younes      ADDRESS_4   
4           7              Adam         Herzburg      ADDRESS_5   

            Date-Birth  Gender     Status   Amount  6_Month  3_Month  4_Month  \
0  1972-08-01 00:00:00    Male     Active  1662.04        0        0        0   
1  1984-08-01 00:00:00    Male  In-Active   814.11        0        0        1   
2  1934-10-26 00:00:00  Female     Active  1875.18        0        0        0   
3  1933-06-23 00:00:00  Female  In-Active  1339.20        0        0        0   
4  1987-08-30 00:00:00    Male  In-Active   335.48        0        0        0   

   1_Year  1_Month  
0       5        0  
1       2        0  
2       6        0  
3       4        0  
4       1        0

关于python - Pandas groupby - 对用户进行分组并计算订阅类型，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39845028/

文章推荐： javascript - HTTP GET 请求( Node )返回 501

文章推荐： Powershell 3 : Remove last line of text file

文章推荐： javascript - 当单元格包含逗号时数据表发出警告

python - Python 中的集群或合并集群以减少组数 (Python)
我正在处理一组标记为 160 个组的 173k 点。我想通过合并最接近的(到 9 或 10 个组)来减少组/集群的数量。我搜索过 sklearn 或类似的库，但没有成功。我猜它只是通过 knn 聚类
python - python 列表的子集基于同一列表的元素组，pythonically
我有一个扁平数字列表，这些数字逻辑上以 3 为一组，其中每个三元组是 (number, __ignored, flag[0 or 1])，例如: [7,56,1, 8,0,0, 2,0,0, 6,1,
python - 激活 Python 虚拟环境并在另一个 Python 脚本中调用 Python 脚本
我正在使用 pipenv 来管理我的包。我想编写一个 python 脚本来调用另一个使用不同虚拟环境(VE)的 python 脚本。如何运行使用 VE1 的 python 脚本 1 并调用另一个 p
python - 在焕然一新的 Python 环境中以编程方式从 Python 内部执行 Python 文件
假设我有一个文件 script.py 位于 path = "foo/bar/script.py"。我正在寻找一种在 Python 中通过函数 execute_script() 从我的主要 Python
python - 从 python 脚本但在 python 脚本之外运行 python 脚本
这听起来像是谜语或笑话，但实际上我还没有找到这个问题的答案。问题到底是什么？我想运行 2 个脚本。在第一个脚本中，我调用另一个脚本，但我希望它们继续并行，而不是在两个单独的线程中。主要是我不希望第
python - 使用不同的 python 从 python 运行 python 脚本
我有一个带有 python 2.5.5 的软件。我想发送一个命令，该命令将在 python 2.7.5 中启动一个脚本，然后继续执行该脚本。我试过用 #!python2.7.5 和http://re
python - 为什么从 Python 命令行调用 Python 时 Python 无法找到并运行我的脚本？
我在 python 命令行(使用 python 2.7)中，并尝试运行 Python 脚本。我的操作系统是 Windows 7。我已将我的目录设置为包含我所有脚本的文件夹，使用: os.chdir("
python - 使用动态版本的 Python 执行嵌入的 Python 代码时出现致命的 Python 错误
剧透:部分解决(见最后)。以下是使用 Python 嵌入的代码示例: #include int main(int argc, char** argv) { Py_SetPythonHome
python - python 中识别 python 数组或列表中最大累积差异的最快方法是什么？
假设我有以下列表，对应于及时的股票价格: prices = [1, 3, 7, 10, 9, 8, 5, 3, 6, 8, 12, 9, 6, 10, 13, 8, 4, 11] 我想确定以下总体上最
python - (Python) 通过单选按钮 python 更新背景
所以我试图在选择某个单选按钮时更改此框架的背景。我的框架位于一个类中，并且单选按钮的功能位于该类之外。 (这样我就可以在所有其他框架上调用它们。) 问题是每当我选择单选按钮时都会出现以下错误: co
python - python 中的字符串与正则表达式比较在 python 中失败
我正在尝试将字符串与 python 中的正则表达式进行比较，如下所示， #!/usr/bin/env python3 import re str1 = "Expecting property name
python - python 如何加载Boost.Python 库？
考虑以下原型(prototype) Boost.Python 模块，该模块从单独的 C++ 头文件中引入类“D”。 /* file: a/b.cpp */ BOOST_PYTHON_MODULE(c)
python - python 检查模块 python 的问题
如何编写一个程序来“识别函数调用的行号？” python 检查模块提供了定位行号的选项，但是， def di(): return inspect.currentframe().f_back.f_l
python - 系统 python 与用户 python
我已经使用 macports 安装了 Python 2.7，并且由于我的 $PATH 变量，这就是我输入 $ python 时得到的变量。然而，virtualenv 默认使用 Python 2.6，除
python - [Python] : Python re. 长字符串行的搜索速度优化
我只想问如何加快 python 上的 re.search 速度。我有一个很长的字符串行，长度为 176861(即带有一些符号的字母数字字符)，我使用此函数测试了该行以进行研究: def getExe
python - 编辑字符串 python 正则表达式 python
list1= [u'%app%%General%%Council%', u'%people%', u'%people%%Regional%%Council%%Mandate%', u'%ppp%%Ge
python - Python 映射中的副作用(Python "do" block )
这个问题在这里已经有了答案: Is it Pythonic to use list comprehensions for just side effects? (7 个答案) 关闭 4 个月前。告
python - 使用其值逻辑组合两个 python 列表 - Python
我想用 Python 将两个列表组合成一个列表，方法如下: a = [1,1,1,2,2,2,3,3,3,3] b= ["Sun", "is", "bright", "June","and" ,"Ju
python - Boost.Python python 链接错误
我正在运行带有最新 Boost 发行版 (1.55.0) 的 Mac OS X 10.8.4 (Darwin 12.4.0)。我正在按照说明 here构建包含在我的发行版中的教程 Boost-Pyth
python - 在 Python 中仅使用内置库制作一个基本的网络抓取工具 - Python
学习 Python，我正在尝试制作一个没有任何第 3 方库的网络抓取工具，这样过程对我来说并没有简化，而且我知道我在做什么。我浏览了一些在线资源，但所有这些都让我对某些事情感到困惑。 html 看起来

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - Pandas groupby - 对用户进行分组并计算订阅类型