gpt4 book ai didi

python - pandas:使用 groupby 对列值的数量求和

转载 作者:太空宇宙 更新时间:2023-11-03 16:38:48 24 4
gpt4 key购买 nike

我有以下数据框:

url='https://raw.githubusercontent.com/108michael/ms_thesis/master/mpl.Bspons.merge.1'
df=pd.read_csv(url, index_col=0)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df = df.set_index(['date'])

df.head(3)

state year unemployment log_diff_unemployment id.thomas party type bills id.fec years_exp session name disposition catcode naics
date
2006-05-01 AK 2006 6.6 -0.044452 1440 Republican sen s2686-109 S2AK00010 39 109 National Cable & Telecommunications Association support C4500 81
2006-05-01 AK 2006 6.6 -0.044452 1440 Republican sen s2686-109 S2AK00010 39 109 National Cable & Telecommunications Association support C4500 517
2007-03-27 AK 2007 6.3 -0.046520 1440 Republican sen s1000-110 S2AK00010 40 110 National Treasury Employees Union support L1100 NaN

我想要对 catcode > disposition > id.fec 定义的每个组中的帐单数量进行求和。我使用以下代码:

df['billsum'] = df.groupby([pd.Grouper(level='date', freq='A'), 'catcode', \
'disposition', 'id.fec']).bills.transform('sum')

返回

df.head(3)

state year unemployment log_diff_unemployment id.thomas party type bills id.fec years_exp session name disposition catcode naics billsum
date
2006-05-01 AK 2006 6.6 -0.044452 1440 Republican sen s2686-109 S2AK00010 39 109 National Cable & Telecommunications Association support C4500 81 s2686-109s2686-109
2006-05-01 AK 2006 6.6 -0.044452 1440 Republican sen s2686-109 S2AK00010 39 109 National Cable & Telecommunications Association support C4500 517 s2686-109s2686-109
2007-03-27 AK 2007 6.3 -0.046520 1440 Republican sen s1000-110 S2AK00010 40 110 National Treasury Employees Union support L1100 NaN s1000-110

该代码不是返回每个组中包含的账单“数量”,而是返回每个组中包含的所有账单。我只想要每组中的账单数量。有人知道如何实现这项工作吗?

最佳答案

我认为你需要转换 size ,不是sum :

df['billsum'] = df.groupby([pd.Grouper(level='date', freq='A'), 'catcode', \
'disposition', 'id.fec']).bills.transform('size')

print df.head(3)
state year unemployment log_diff_unemployment id.thomas \
date
2006-05-01 AK 2006.0 6.6 -0.044452 1440
2006-05-01 AK 2006.0 6.6 -0.044452 1440
2007-03-27 AK 2007.0 6.3 -0.046520 1440

party type bills id.fec years_exp session \
date
2006-05-01 Republican sen s2686-109 S2AK00010 39 109
2006-05-01 Republican sen s2686-109 S2AK00010 39 109
2007-03-27 Republican sen s1000-110 S2AK00010 40 110

name disposition \
date
2006-05-01 National Cable & Telecommunications Association support
2006-05-01 National Cable & Telecommunications Association support
2007-03-27 National Treasury Employees Union support

catcode naics billsum
date
2006-05-01 C4500 81 2
2006-05-01 C4500 517 2
2007-03-27 L1100 NaN 1

关于python - pandas:使用 groupby 对列值的数量求和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36998012/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com