gpt4 book ai didi

python - Panda 将数据帧分组到用户指定的时间段中

转载 作者:太空宇宙 更新时间:2023-11-03 18:19:18 25 4
gpt4 key购买 nike

可能相关:pandas dataframe group year index by decade

例如,如果我有如下数据

                     status  bytes_sent upstream_cache_status  \
timestamp
2014-05-26 23:56:30 200 356 MISS
2014-05-26 23:56:30 200 10517 -
2014-05-26 23:57:05 200 6923 MISS
2014-05-26 23:57:14 200 323 -
2014-05-26 23:57:30 200 356 MISS
2014-05-26 23:57:38 200 8107 HIT
2014-05-26 23:57:43 200 369 MISS
2014-05-26 23:57:56 304 401 HIT
2014-05-26 23:57:56 304 401 HIT
2014-05-26 23:57:56 304 387 MISS
2014-05-26 23:57:57 304 401 HIT
2014-05-26 23:57:58 304 401 HIT
2014-05-26 23:58:08 200 507 EXPIRED
2014-05-26 23:58:29 304 338 HIT
2014-05-26 23:58:31 400 409 -
2014-05-26 23:58:45 200 425 MISS

如果我想对它们进行分组,使每个组包含 30 秒内的日志(时间由用户指定),我该怎么做?我见过这个

df.groupby(lambda x: x.hour)

但我非常怀疑它与我的情况相关

最佳答案

df.groupby(pd.Grouper(freq='30S', level=0)) 应该这样做;例如

>>> aggr = lambda df: df.apply(tuple)
>>> df.groupby(pd.Grouper(freq='30S', level=0)).aggregate(aggr)
status bytes_sent \
timestamp
2014-06-26 23:56:30 (200, 200) (356, 10517)
2014-06-26 23:57:00 (200, 200) (6923, 323)
2014-06-26 23:57:30 (200, 200, 200, 304, 304, 304, 304, 304) (356, 8107, 369, 401, 401, 387, 401, 401)
2014-06-26 23:58:00 (200, 304) (507, 338)
2014-06-26 23:58:30 (400, 200) (409, 425)

upstream_cache_status
timestamp
2014-06-26 23:56:30 (MISS, -)
2014-06-26 23:57:00 (MISS, -)
2014-06-26 23:57:30 (MISS, HIT, MISS, HIT, HIT, MISS, HIT, HIT)
2014-06-26 23:58:00 (EXPIRED, HIT)
2014-06-26 23:58:30 (-, MISS)

关于python - Panda 将数据帧分组到用户指定的时间段中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24428856/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com