gpt4 book ai didi

python-3.x - 计算滚动 3 天 Pandas 的不同计数?

转载 作者:行者123 更新时间:2023-12-04 07:34:20 25 4
gpt4 key购买 nike

我想计算按城市分组的 3 天窗口内的唯一客户
输入:

    df = pd.DataFrame([['1A', 'Cairo', '2020-12-01'],
["2A", 'Cairo', '2020-12-01'],
['1A', 'Cairo', '2020-12-02'],
['1A', 'Cairo', '2020-12-03'],
['3A', 'Alex', '2020-12-01'],
['3A', 'Alex', '2020-12-02'],
['3A', 'Alex', '2020-12-03'],
['4A', 'Giza', '2020-12-02'],
['4A', 'Giza', '2020-12-02'],
['5A', 'Giza', '2020-12-03'],
['6A', 'Giza', '2020-12-01']], columns=
['customer_id', 'city', 'day'])
预期输出:
    output = pd.DataFrame([['Alex', '2020-12-01',1],
['Alex', '2020-12-02',1],
['Alex', '2020-12-03',1],
['Cairo', '2020-12-01',2],
['Cairo', '2020-12-02',2],
['Cairo', '2020-12-03',2],
['Giza', '2020-12-01',1],
['Giza', '2020-12-02',2],
['Giza', '2020-12-03',3]], columns=
['city', 'day', 'unique_customers_last3Days'])
我试过:
df['day'] = pd.to_datetime(df['day'])
df.set_index('day',inplace=True)
df.sort_index(inplace=True)
df.groupby('city').rolling("3D").agg({'customer_id':'nun'})
但它给了我错误
AttributeError: 'nunique' is not a valid function for 'RollingGroupby' object

最佳答案

将数据帧的索引设置为 day然后 sort索引值,现在 factorize customer_id列以便为每个客户 ID 分配唯一代码,然后 group city 上的数据框和 apply rolling nunique窗口大小为 3 days 的操作.可选 drop day 中的重复值每个city

df = df.set_index('day').sort_index()
df['codes'] = df['customer_id'].factorize()[0]

df.groupby('city')\
.rolling('3D')['codes'].apply(pd.Series.nunique)\
.reset_index(name='unique').drop_duplicates(['city', 'day'], keep='last')
     city        day  unique
0 Alex 2020-12-01 1.0
1 Alex 2020-12-02 1.0
2 Alex 2020-12-03 1.0
4 Cairo 2020-12-01 2.0
5 Cairo 2020-12-02 2.0
6 Cairo 2020-12-03 2.0
7 Giza 2020-12-01 1.0
9 Giza 2020-12-02 2.0
10 Giza 2020-12-03 3.0

关于python-3.x - 计算滚动 3 天 Pandas 的不同计数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67807136/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com