gpt4 book ai didi

python - 按两列分组, Pandas python 中第三列的最大值

转载 作者:太空宇宙 更新时间:2023-11-03 14:52:42 25 4
gpt4 key购买 nike

我有一个包含 PERIOD_START_TIME、ID、更多列和 VALUE 列的数据框。我需要的是按 PERIOD_START_TIME 和 ID 分组(因为按时间和 ID 有重复的行)并取 VALUE 列的最大值。df:

PERIOD_START_TIME     ID       VALUE
06.01.2017 02:00:00 55 ... 35
06.01.2017 02:00:00 55 ... 22
06.01.2017 03:00:00 55 ... 63
06.01.2017 03:00:00 55 ... 33
06.01.2017 04:00:00 55 ... 63
06.01.2017 04:00:00 55 ... 45
06.01.2017 02:00:00 65 ... 10
06.01.2017 02:00:00 65 ... 5
06.01.2017 03:00:00 65 ... 22
06.01.2017 03:00:00 65 ... 5
06.01.2017 04:00:00 65 ... 12
06.01.2017 04:00:00 65 ... 15

期望的输出:

PERIOD_START_TIME     ID  ...  VALUE
06.01.2017 02:00:00 55 ... 35
06.01.2017 03:00:00 55 ... 63
06.01.2017 04:00:00 55 ... 63
06.01.2017 02:00:00 65 ... 10
06.01.2017 03:00:00 65 ... 22
06.01.2017 04:00:00 65 ... 15

最佳答案

使用groupby和聚合max :

print (df)
PERIOD_START_TIME ID A VALUE
0 06.01.2017 02:00:00 55 8 35
1 06.01.2017 02:00:00 55 8 22
2 06.01.2017 03:00:00 55 8 63
3 06.01.2017 03:00:00 55 8 33
4 06.01.2017 04:00:00 55 8 63
5 06.01.2017 04:00:00 55 8 45
6 06.01.2017 02:00:00 65 8 10
7 06.01.2017 02:00:00 65 8 5
8 06.01.2017 03:00:00 65 8 22
9 06.01.2017 03:00:00 65 8 5
10 06.01.2017 04:00:00 65 8 12
11 06.01.2017 04:00:00 65 8 15

df = df.groupby(['PERIOD_START_TIME','ID'], as_index=False)['VALUE'].max()

或者:

df = df.groupby(['PERIOD_START_TIME','ID'])['VALUE'].max().reset_index()

print (df)
PERIOD_START_TIME ID VALUE
0 06.01.2017 02:00:00 55 35
1 06.01.2017 02:00:00 65 10
2 06.01.2017 03:00:00 55 63
3 06.01.2017 03:00:00 65 22
4 06.01.2017 04:00:00 55 63
5 06.01.2017 04:00:00 65 15

更多栏目需要idxmax并选择 loc :

df = df.loc[df.groupby(['PERIOD_START_TIME','ID'])['VALUE'].idxmax()]  
print (df)
PERIOD_START_TIME ID A VALUE
0 06.01.2017 02:00:00 55 8 35
6 06.01.2017 02:00:00 65 8 10
2 06.01.2017 03:00:00 55 8 63
8 06.01.2017 03:00:00 65 8 22
4 06.01.2017 04:00:00 55 8 63
11 06.01.2017 04:00:00 65 8 15

备选方案:

cols = ['PERIOD_START_TIME','ID']
df = df.sort_values(cols).groupby(cols, as_index=False).first()
print (df)
PERIOD_START_TIME ID A VALUE
0 06.01.2017 02:00:00 55 8 35
1 06.01.2017 02:00:00 65 8 10
2 06.01.2017 03:00:00 55 8 63
3 06.01.2017 03:00:00 65 8 22
4 06.01.2017 04:00:00 55 8 63
5 06.01.2017 04:00:00 65 8 12

关于python - 按两列分组, Pandas python 中第三列的最大值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44776593/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com