gpt4 book ai didi

python - 使用 Pandas 的 Groupby df 列

转载 作者:太空宇宙 更新时间:2023-11-03 15:01:08 25 4
gpt4 key购买 nike

我有数据

1        member_id  application_name  active_seconds 
2 192180 Opera 6
3 192180 Opera 7
4 192180 Chrome 243
5 5433112 Chrome 52
6 5433112 Opera 34
7 5433112 Chrome 465

我需要根据 application_name 的使用次数和 active_seconds 的数量对其进行分组

我用print df.groupby(['member_id', 'application_name']).count() 但我得到的结果是 active_second

print df.groupby(['member_id', 'application_name'])['active_seconds'].count() 

工作不正常。我做错了什么?

最佳答案

我想你需要aggregate :

df1 = df.groupby(['member_id', 'application_name'])
.agg({'application_name':len, 'active_seconds':sum})

print (df1)
active_seconds application_name
member_id application_name
192180 Chrome 243 1
Opera 13 2
5433112 Chrome 517 2
Opera 34 1

如果需要reset_index , 第一 rename列(因为 ValueError:无法插入 application_name,已经存在):

df1 = df.groupby(['member_id', 'application_name'])
.agg({'application_name':len, 'active_seconds':sum})
.rename(columns={'active_seconds':'count_sec','application_name':'sum_app'})
.reset_index()

print (df1)
member_id application_name count_sec sum_app
0 192180 Chrome 243 1
1 192180 Opera 13 2
2 5433112 Chrome 517 2
3 5433112 Opera 34 1

时间:

In [208]: %timeit df.groupby(['member_id', 'application_name']).agg({'application_name':len, 'active_seconds':sum}).rename(columns={'active_seconds':'count_sec','application_name':'sum_app'}).reset_index()
10 loops, best of 3: 93.6 ms per loop

In [209]: %timeit (f1(df))
10 loops, best of 3: 127 ms per loop

测试代码:

import pandas as pd

df = pd.DataFrame({'member_id': {0: 192180, 1: 192180, 2: 192180, 3: 5433112, 4: 5433112, 5: 5433112},
'active_seconds': {0: 6, 1: 7, 2: 243, 3: 52, 4: 34, 5: 465},
'application_name': {0: 'Opera', 1: 'Opera', 2: 'Chrome', 3: 'Chrome', 4: 'Opera', 5: 'Chrome'}})
print (df)
# active_seconds application_name member_id
#0 6 Opera 192180
#1 7 Opera 192180
#2 243 Chrome 192180
#3 52 Chrome 5433112
#4 34 Opera 5433112
#5 465 Chrome 5433112

df = pd.concat([df]*1000).reset_index(drop=True)
print (len(df))
#6000

df1 = df.groupby(['member_id', 'application_name']).agg({'application_name':len, 'active_seconds':sum}).rename(columns={'active_seconds':'count_sec','application_name':'sum_app'}).reset_index()
print (df1)

def f1(df):
a = (df.groupby(['member_id', 'application_name'])['active_seconds'].sum() )
b = (df.groupby(['member_id', 'application_name']).size())
return (pd.concat([a,b], axis=1, keys=['count_sec','sum_app']).reset_index())

print (f1(df))
#   member_id application_name  count_sec  sum_app
#0 192180 Chrome 243000 1000
#1 192180 Opera 13000 2000
#2 5433112 Chrome 517000 2000
#3 5433112 Opera 34000 1000
# member_id application_name count_sec sum_app
#0 192180 Chrome 243000 1000
#1 192180 Opera 13000 2000
#2 5433112 Chrome 517000 2000
#3 5433112 Opera 34000 1000

关于python - 使用 Pandas 的 Groupby df 列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37878864/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com