gpt4 book ai didi

python - Pandas 等效选择计数(不同的 col1,col2)按 col3 分组

转载 作者:太空宇宙 更新时间:2023-11-03 12:36:23 25 4
gpt4 key购买 nike

制作数据框:

people = ['shayna','shayna','shayna','shayna','john']
dates = ['01-01-18','01-01-18','01-01-18','01-02-18','01-02-18']
places = ['hospital', 'hospital', 'inpatient', 'hospital', 'hospital']
d = {'Person':people,'Service_Date':dates, 'Site_Where_Served':places}
df = pd.DataFrame(d)
df

Person Service_Date Site_Where_Served
shayna 01-01-18 hospital
shayna 01-01-18 hospital
shayna 01-01-18 inpatient
shayna 01-02-18 hospital
john 01-02-18 hospital

我想做的是计算按 Site_Where_Served 分组的 Person 及其 Service_Date 的唯一对。

预期输出:

Site_Where_Served    Site_Visit_Count
hospital 3
inpatient 1

我的尝试:

df[['Person', 'Service_Date']].groupby(df['Site_Where_Served']).nunique().reset_index(name='Site_Visit_Count')

但是它不知道如何重置索引。因此,我尝试将其排除在外,但我意识到它并未计算唯一的“Person”和“Service_Date”对,因为输出如下所示:

                   Person    Service_Date
Site_Where_Served
hospital 2 2
inpatient 1 1

最佳答案

drop_duplicatesgroupby + count

(df.drop_duplicates()
.groupby('Site_Where_Served')
.Site_Where_Served.count()
.reset_index(name='Site_Visit_Count')
)

Site_Where_Served Site_Visit_Count
0 hospital 3
1 inpatient 1

请注意,count/size 之间的一个微小区别是前者不计算 NaN 条目。


元组化、groupbynunique

这实际上只是修复了您当前的解决方案,但我不推荐这样做,因为它冗长而冗长,步骤比必要的多。首先,对您的列进行元组化,按 Site_Where_Served 分组,然后计数:

(df[['Person', 'Service_Date']]
.apply(tuple, 1)
.groupby(df.Site_Where_Served)
.nunique()
.reset_index(name='Site_Visit_Count')
)

Site_Where_Served Site_Visit_Count
0 hospital 3
1 inpatient 1

关于python - Pandas 等效选择计数(不同的 col1,col2)按 col3 分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50360326/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com