gpt4 book ai didi

python - Pandas 通过滚动打开的窗口进行分组

转载 作者:太空宇宙 更新时间:2023-11-03 14:00:02 31 4
gpt4 key购买 nike

假设我们有这样的 df(用户在同一日期可能有多行):

df = pd.DataFrame({"user_id" : ["A"] * 5 + ["B"] * 5,
"hour" : [10] * 10,
"date" : ["2018-01-16", "2018-01-16","2018-01-18","2018-01-19","2018-02-16","2018-01-16", "2018-01-16","2018-01-18","2018-01-19","2018-02-16"], "amount" : [1] * 10})
df['date'] = pd.to_datetime(df['date'])

输出:

amount  date    hour    user_id
0 1 2018-01-16 10 A
1 1 2018-01-16 10 A
2 1 2018-01-18 10 A
3 1 2018-01-19 10 A
4 1 2018-02-16 10 A
5 1 2018-01-16 10 B
6 1 2018-01-16 10 B
7 1 2018-01-18 10 B
8 1 2018-01-19 10 B
9 1 2018-02-16 10 B

我想获取每个user_id小时金额聚合滚动统计数据。目前我是这样做的:

def get_rolling_stats(df, rolling_interval = 7) : 
index_cols = ['user_id', 'hour', 'date']
grp = df.groupby(by = ['user_id', 'hour'], as_index = True, group_keys = False).rolling(window='%sD'%rolling_interval, on = 'date')
def agg_grp(grp, func):
res = grp.agg({'amount' : func})

res = res.reset_index()
res.drop_duplicates(index_cols, inplace = True, keep = 'last')
res.rename(columns = {'amount' : "amount_" + func}, inplace = True)
return res

grp1 = agg_grp(grp, "mean")
grp2 = agg_grp(grp, "count")

grp = grp1.merge(grp2, on = index_cols)
return grp

所以它输出:

user_id hour    date    amount_mean amount_count
0 A 10 2018-01-16 1.0 1.0
1 A 10 2018-01-18 1.0 3.0
2 A 10 2018-01-19 1.0 4.0
3 A 10 2018-02-16 1.0 1.0
4 B 10 2018-01-16 1.0 1.0
5 B 10 2018-01-18 1.0 3.0
6 B 10 2018-01-19 1.0 4.0
7 B 10 2018-02-16 1.0 1.0

但我想从滚动窗口中排除当前日期。所以我想要这样的输出:

user_id hour    date    amount_mean amount_count
0 A 10 2018-01-16 nan 0.0
1 A 10 2018-01-18 1.0 2.0
2 A 10 2018-01-19 1.0 3.0
3 A 10 2018-02-16 nan 0.0
4 B 10 2018-01-16 nan 0.0
5 B 10 2018-01-18 1.0 2.0
6 B 10 2018-01-19 1.0 3.0
7 B 10 2018-02-16 nan 0.0

我读到rolling方法有argclose。但如果我使用它 - 它会引发错误:ValueError:仅针对日期时间和基于偏移的窗口实现了关闭。我还没有找到任何如何使用它的示例。有人可以阐明如何正确实现 get_rolling_stats 函数吗?

最佳答案

好像我找到了例子 - https://pandas.pydata.org/pandas-docs/stable/computation.html#rolling-window-endpoints 。我所要做的就是替换:

grp = df.groupby(by = ['user_id', 'hour'], as_index = True, group_keys = False).rolling(window='%sD'%rolling_interval, on = 'date')

grp = df.set_index('date').groupby(by = ['user_id', 'hour'], as_index = True, group_keys = False).\
rolling(window='%sD'%rolling_interval, closed = 'neither')

关于python - Pandas 通过滚动打开的窗口进行分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49322597/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com