gpt4 book ai didi

python - Pandas groupby : 3 max per period among multiple columns

转载 作者:太空宇宙 更新时间:2023-11-03 14:22:54 25 4
gpt4 key购买 nike

我有这些数据:

                     val1  val2  val3
dt
2017-12-15 00:00:00 81 90 79
2017-12-15 00:01:00 67 85 80
2017-12-15 00:02:00 4 41 37
2017-12-15 00:03:00 61 68 29
2017-12-15 00:04:00 49 6 56
2017-12-15 00:05:00 94 13 93
2017-12-15 00:06:00 91 3 75
2017-12-15 00:07:00 94 81 7
2017-12-15 00:08:00 55 59 33
2017-12-15 00:09:00 97 89 26
2017-12-15 00:10:00 17 75 88
2017-12-15 00:11:00 39 40 96
2017-12-15 00:12:00 61 20 70
2017-12-15 00:13:00 62 31 93
2017-12-15 00:14:00 7 26 29

我想找到每 5 分钟周期的 3 个最大值。最大值可以位于任何列(val1、val2、val3)中,并且必须在 5 分钟内可用的 15 个值中进行搜索。

目前我只能找到单列中最大的一个。是否可以在多列中搜索nlargest?

这是生成数据并搜索 val1 最大值的代码:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

date_ref = datetime(2017, 12, 15, 0,0,0)
dtime = pd.date_range(date_ref, freq='1min', periods=15)

np.random.seed(seed=1115)
data1 = np.random.randint(1, high=100, size=len(dtime))
data2 = np.random.randint(1, high=100, size=len(dtime))
data3 = np.random.randint(1, high=100, size=len(dtime))

df = pd.DataFrame({'dt': dtime, 'val1': data1, 'val2': data2, 'val3': data3})
df.set_index('dt', inplace=True)

print(df)

group = df.groupby(pd.Grouper(freq='5min'))

max_only_for_val1 = (pd.DataFrame(
group["val1"]
.nlargest(3))
.reset_index(level=1, drop=True)
)

print(max_only_for_val1)

这是输出:

                     val1
dt
2017-12-15 00:00:00 81
2017-12-15 00:00:00 67
2017-12-15 00:00:00 61
2017-12-15 00:05:00 97
2017-12-15 00:05:00 94
2017-12-15 00:05:00 94
2017-12-15 00:10:00 62
2017-12-15 00:10:00 61
2017-12-15 00:10:00 39

最佳答案

由于您的值来自哪里并不重要,所以让我们稍微 reshape 一下您的数据。

df = df.reset_index().melt('dt').drop('variable', 1)
df.head(10)

dt value
0 2017-12-15 00:00:00 81
1 2017-12-15 00:01:00 67
2 2017-12-15 00:02:00 4
3 2017-12-15 00:03:00 61
4 2017-12-15 00:04:00 49
5 2017-12-15 00:05:00 94
6 2017-12-15 00:06:00 91
7 2017-12-15 00:07:00 94
8 2017-12-15 00:08:00 55
9 2017-12-15 00:09:00 97

现在,调用groupby + apply -

def get_max3(x):
return x.sort_values(ascending=False).head(3)

df = df.groupby(pd.Grouper(key='dt', freq='5min'))['value']\
.apply(get_max3)\
.reset_index(0)\
.reset_index(drop=True)

dt value
0 2017-12-15 00:00:00 90
1 2017-12-15 00:00:00 85
2 2017-12-15 00:00:00 81
3 2017-12-15 00:05:00 97
4 2017-12-15 00:05:00 94
5 2017-12-15 00:05:00 94
6 2017-12-15 00:10:00 96
7 2017-12-15 00:10:00 93
8 2017-12-15 00:10:00 88

使用 numpy.sortget_max3 的替代定义 -

def get_max3(x):
return np.sort(x.values)[-4::-1]

关于python - Pandas groupby : 3 max per period among multiple columns,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47829357/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com