gpt4 book ai didi

python - pandas 如何在 groupby 内进行 groupby

转载 作者:行者123 更新时间:2023-12-01 02:08:32 27 4
gpt4 key购买 nike

我有以下数据:

,dateTime,magnitude,occurrence,dateTime_s
1,2017-11-20 08:00:09.052260,12861,1,2017-11-20 08:00:09.000000
2,2017-11-20 08:00:09.052270,12868.12,1,2017-11-20 08:00:09.000000
3,2017-11-20 08:00:09.052282,12868.12,1,2017-11-20 08:00:09.000000
4,2017-11-20 08:00:09.052291,12867.5,2,2017-11-20 08:00:09.000000
5,2017-11-20 08:00:09.052315,12867.5,4,2017-11-20 08:00:09.000000
6,2017-11-20 08:00:09.052315,12867,1,2017-11-20 08:00:09.000000
7,2017-11-20 08:00:09.052315,12865.5,1,2017-11-20 08:00:09.000000
8,2017-11-20 08:00:09.052315,12865.89,1,2017-11-20 08:00:09.000000
9,2017-11-20 08:00:12.064744,12867.5,1,2017-11-20 08:00:12.000000
10,2017-11-20 08:00:12.131555,12868.5,2,2017-11-20 08:00:12.000000
11,2017-11-20 08:00:12.333511,12868.5,4,2017-11-20 08:00:12.000000
12,2017-11-20 08:00:12.333511,12869.95,2,2017-11-20 08:00:12.000000
13,2017-11-20 08:00:12.341516,12869.5,1,2017-11-20 08:00:12.000000
14,2017-11-20 08:00:12.343538,12868.5,1,2017-11-20 08:00:12.000000
15,2017-11-20 08:00:12.343538,12868.17,5,2017-11-20 08:00:12.000000
16,2017-11-20 08:00:12.343538,12867.5,2,2017-11-20 08:00:12.000000
17,2017-11-20 08:00:14.148704,12882.5,1,2017-11-20 08:00:14.000000
18,2017-11-20 08:00:14.148748,12882.5,1,2017-11-20 08:00:14.000000
19,2017-11-20 08:00:14.218977,12883.66,1,2017-11-20 08:00:14.000000
20,2017-11-20 08:00:14.218977,12883.5,1,2017-11-20 08:00:14.000000
21,2017-11-20 08:00:14.385283,12882.09,1,2017-11-20 08:00:14.000000
22,2017-11-20 08:00:14.388518,12881.5,1,2017-11-20 08:00:14.000000
23,2017-11-20 08:00:14.577002,12882.5,1,2017-11-20 08:00:14.000000

我使用以下代码按时间聚合它(因为它是 milis,我需要按秒计算。

import pandas as pd
import numpy as np

df = pd.read_csv('C:/Users/Data/test.csv')
print(df.head(30))

groups = df.groupby('dateTime_s')
df_grouped = (groups.agg({
'magnitude': np.mean,
'occurrence': np.sum,
}))
print(df_grouped.head())

结果很好:

                               magnitude  occurrence
dateTime_s
2017-11-20 08:00:09.000000 12866.328750 12
2017-11-20 08:00:12.000000 12868.515000 18
2017-11-20 08:00:14.000000 12882.607143 7

但是对于我的研究,我需要添加最频繁的震级及其出现次数。如何进行分组(在当前分组内)并计算频率最高的幅度并显示幅度和频率?

我正在寻找这样的结果:

                    groupby magnitude   
dateTime_s magnitude occurrence max sum
2017-11-20 08:00:09.000000 12866.32875 12 12867.5 6
2017-11-20 08:00:12.000000 12868.515 18 12868.5 7
2017-11-20 08:00:14.000000 12882.607143 7 12882.5 3

最佳答案

我相信您需要自定义函数来按最高幅度值计算发生值的总和:

groups = df.groupby('dateTime_s')
df_grouped = (groups.agg({
'magnitude': np.mean,
'occurrence': np.sum,
}))
#print (df_grouped)

def f(x):
a = x['magnitude'].value_counts().index[0]
b = x.loc[x['magnitude'] == a, 'occurrence'].sum()
return pd.Series([a,b],['max magn','freq oc'])

df_grouped1 = groups.apply(f)
#print (df_grouped1)


df = pd.concat([df_grouped, df_grouped1], axis=1)
print (df)
magnitude occurrence max magn freq oc
dateTime_s
2017-11-20 08:00:09 12866.328750 12 12867.5 6.0
2017-11-20 08:00:12 12868.515000 18 12868.5 7.0
2017-11-20 08:00:14 12882.607143 7 12882.5 3.0

或者仅自定义函数:

groups = df.groupby('dateTime_s')

def f(x):
a = x['magnitude'].value_counts().index[0]
b = x.loc[x['magnitude'] == a, 'occurrence'].sum()
c = x['magnitude'].mean()
d = x['occurrence'].sum()
return pd.Series([a,b,c,d],['max magn','freq oc', 'mean', 'sum'])

df_grouped1 = groups.apply(f)
print (df_grouped1)

max magn freq oc mean sum
dateTime_s
2017-11-20 08:00:09 12867.5 6.0 12866.328750 12.0
2017-11-20 08:00:12 12868.5 7.0 12868.515000 18.0
2017-11-20 08:00:14 12882.5 3.0 12882.607143 7.0

关于python - pandas 如何在 groupby 内进行 groupby,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48849930/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com