gpt4 book ai didi

python - 限制groupby操作

转载 作者:太空宇宙 更新时间:2023-11-03 14:09:25 27 4
gpt4 key购买 nike

我有下面显示的三个数据框:

  1. primarydf 是一个数据框,显示一本书TitleCodeType(数字/物理)、WeekEnding(意思是单位数据是截至该日期的一周),以及TotalUnits(已售出多少单位)。
  2. attachdf 是一个数据框,显示与每本书相关的产品,该产品在该书上市后发布。
  3. expecteddf 是我为了清楚起见而尝试通过一些额外的列来实现的详细版本。

我想要的实际列是标题代码、类型、第 1 周附加率、第 4 周附加率、LTD 附加率附加单位总数

  • Week1A= 第 1 周销售的附加产品。

  • Week1U= 截至并包括所附产品发布的第 1 周的图书销量。

  • Week4A= 第 4 周(包括第 4 周)之前销售的附加产品。

  • Week4U= 所售图书的销量截至并包括所附产品发布的第 4 周。

  • LTDA= 截至最近日期已售出的附加产品。

  • LTDU= 截至最近日期已售出的书籍。

例如,['第 1 周附着率'] = ['Week1A']/['Week1U'],['第 4 周附着率'] = ['Week4A' ]/['Week4U'], ['LTD 附加率'] = ['LTDA']/['LTD1U'],

数据框

import pandas as pd

from io import StringIO
attachproduct = StringIO("""TitleCode,Type,WeekEnding,TotalUnits
A,1,12/16/2017 0:00,548
A,1,12/23/2017 0:00,74
A,1,12/30/2017 0:00,87
A,1,1/6/2018 0:00,4
A,1,1/13/2018 0:00,878
A,2,12/16/2017 0:00,49
A,2,12/23/2017 0:00,8498
A,2,12/30/2017 0:00,84
A,2,1/6/2018 0:00,74
A,2,1/13/2018 0:00,453
B,1,12/23/2017 0:00,68
B,1,12/30/2017 0:00,573
B,1,1/6/2018 0:00,75
B,1,1/13/2018 0:00,752
B,1,1/20/2018 0:00,75
B,2,12/23/2017 0:00,17
B,2,12/30/2017 0:00,98
B,2,1/6/2018 0:00,7875
B,2,1/13/2018 0:00,73
B,2,1/20/2018 0:00,75
C,1,12/23/2017 0:00,79
C,1,12/30/2017 0:00,75727
C,1,1/6/2018 0:00,77
C,1,1/13/2018 0:00,3727
C,1,1/20/2018 0:00,72
C,1,1/27/2018 0:00,7275
C,1,2/3/2018 0:00,27
""")

primaryproduct = StringIO("""TitleCode,Type,WeekEnding,TotalUnits
A,1,11/11/2017 0:00,830
A,1,11/18/2017 0:00,830
A,1,11/25/2017 0:00,830
A,1,12/2/2017 0:00,132
A,1,12/9/2017 0:00,161
A,1,12/16/2017 0:00,6
A,1,12/23/2017 0:00,1701
A,1,12/30/2017 0:00,1240
A,2,11/11/2017 0:00,141
A,2,11/18/2017 0:00,141
A,2,11/25/2017 0:00,141
A,2,12/2/2017 0:00,22388
A,2,12/9/2017 0:00,255
A,2,12/16/2017 0:00,90
A,2,12/23/2017 0:00,1471
A,2,12/30/2017 0:00,1010
A,2,1/6/2018 0:00,8
A,2,1/13/2018 0:00,9
B,1,12/2/2017 0:00,254
B,1,12/9/2017 0:00,1022
B,1,12/16/2017 0:00,241
B,1,12/23/2017 0:00,1532
B,1,12/30/2017 0:00,122
B,1,1/6/2018 0:00,442
B,1,1/13/2018 0:00,761
B,1,1/20/2018 0:00,1081
B,2,12/2/2017 0:00,49
B,2,12/9/2017 0:00,351
B,2,12/16/2017 0:00,19951
B,2,12/23/2017 0:00,253
B,2,12/30/2017 0:00,282
B,2,1/6/2018 0:00,601
B,2,1/13/2018 0:00,921
B,2,1/20/2018 0:00,1241
C,1,11/25/2017 0:00,273
C,1,12/2/2017 0:00,151944
C,1,12/9/2017 0:00,95
C,1,12/16/2017 0:00,8736
C,1,12/23/2017 0:00,172
C,1,12/30/2017 0:00,15005
C,1,1/6/2018 0:00,51
C,1,1/13/2018 0:00,52
C,1,1/20/2018 0:00,45
C,1,1/27/2018 0:00,6
C,1,2/3/2018 0:00,55
""")

expected = StringIO("""TitleCode,Type,Week1A,Week4A,LTDA,Week1M,Week4M,LTDM,Week 1 Attach Rate,4 Week Attach Rate,LTD Attach Rate,Total Attached Units
A,1,548,713,1591,2789,5731,5731,19.6%,12.4%,27.8%,1591
A,2,49,8705,9158,23155,25644,25653,0.2%,33.9%,35.7%,9158
B,1,68,1468,1543,3049,301644,5455,2.2%,0.5%,28.3%,1543
B,2,17,8063,8138,20604,22408,23648,0.1%,36.0%,34.4%,8138
C,1,79,79610,86984,161220,176327,176433,0.0%,45.1%,49.3%,86984
""")

attachdf = pd.read_csv(attachproduct, parse_dates=True)
primarydf = pd.read_csv(primaryproduct, parse_dates=True)
expecteddf = pd.read_csv(expected, parse_dates=True)

attachdf['WeekEnding']=pd.to_datetime(attachdf['WeekEnding'])
primarydf['WeekEnding']=pd.to_datetime(primarydf['WeekEnding'])

获取迄今为止的生命值很简单,但我不确定我是否理解对于受限的第 1 周和第 4 周费率来说最好的方法是什么。

ltdattach=attachdf.groupby(['TitleCode','Type']).sum())/(primarydf.groupby(['TitleCode','Type']).sum())

最佳答案

考虑使用 groupbytransform 计算列,并连接或合并辅助数据框、week1dfweek4df:

# ADD NEW COLUMNS TO ATTACH DF
attachdf['WeekNo'] = attachdf.groupby(['TitleCode', 'Type']).cumcount()+1
attachdf['Week4A'] = attachdf[attachdf['WeekNo']<=4].groupby(['TitleCode', 'Type'])['TotalUnits'].transform('sum')
attachdf['LTDA'] = attachdf.groupby(['TitleCode', 'Type'])['TotalUnits'].transform('sum')


# ADD NEW COLUMNS TO PRIMARY DF
primarydf['LTDM'] = primarydf.groupby(['TitleCode', 'Type'])['TotalUnits'].transform('sum')
primarydf['WeekNo'] = primarydf.groupby(['TitleCode', 'Type']).cumcount()+1


# WEEK 1 DF (LEFT JOIN MERGE)
week1df = primarydf.merge(attachdf[attachdf['WeekNo']==1], on=['TitleCode', 'Type'],
suffixes=['', '_'], how='left').query('WeekEnding <= WeekEnding_')
week1df['Week1M'] = week1df.groupby(['TitleCode', 'Type'])['TotalUnits'].transform('sum')
week1df = week1df[week1df['WeekNo']==1][['TitleCode', 'Type', 'TotalUnits_', 'Week4A', 'LTDA', 'Week1M']]\
.rename(columns={'TotalUnits_':'Week1A'})


# WEEK 4 DF (LEFT JOIN MERGE)
week4df = primarydf.merge(attachdf[attachdf['WeekNo']==4], on=['TitleCode', 'Type'],
suffixes=['', '_'], how='left').query('WeekEnding <= WeekEnding_')
week4df['Week4M'] = week4df.groupby(['TitleCode', 'Type'])['TotalUnits'].transform('sum')
week4df = week4df[week4df['WeekNo']==1][['TitleCode', 'Type', 'Week4M', 'LTDM']]


# FINAL (MERGE WEEKS WITH PCT COLUMNS)
finaldf = week1df.merge(week4df, on=['TitleCode', 'Type'])

finaldf['Week 1 Attach Rate'] = finaldf['Week1A'] / finaldf['Week1M']
finaldf['Week 4 Attach Rate'] = finaldf['Week4A'] / finaldf['Week4M']
finaldf['LTD Attach Rates'] = finaldf['LTDA'] / finaldf['LTDM']
finaldf['Total Attached Units'] = finaldf['LTDA']

输出

print(finaldf)

# TitleCode Type Week1A Week4A LTDA Week1M Week4M LTDM Week 1 Attach Rate Week 4 Attach Rate LTD Attach Rates Total Attached Units
# 0 A 1 548 713.0 1591 2789 5730 5730 0.196486 0.124433 0.277661 1591
# 1 A 2 49 8705.0 9158 23156 25645 25654 0.002116 0.339442 0.356981 9158
# 2 B 1 68 1468.0 1543 3049 4374 5455 0.022302 0.335620 0.282860 1543
# 3 B 2 17 8063.0 8138 20604 22408 23649 0.000825 0.359827 0.344116 8138
# 4 C 1 79 79610.0 86984 161220 176328 176434 0.000490 0.451488 0.493012 86984

关于python - 限制groupby操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48626841/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com