gpt4 book ai didi

python - Groupby 客户和商店 - 获得平均交易频率。日期问题

转载 作者:行者123 更新时间:2023-12-04 15:17:11 25 4
gpt4 key购买 nike

我有以下交易 DF。日期格式为年/月/日

print(df)

customer_id shop date_of_transaction
0 John McDonalds 2020-02-03
1 John McDonalds 2020-02-04
2 John McDonalds 2020-02-05
3 John KFC 2020-02-06
4 John KFC 2020-02-07
5 John KFC 2020-02-08
6 Mary McDonalds 2020-02-09
7 Mary McDonalds 2020-02-10
8 Mary McDonalds 2020-02-11
9 Mary KFC 2020-02-12
10 Mary KFC 2020-02-13
11 Joe KFC 2020-02-14
12 Joe McDonalds 2020-02-15
13 Joe McDonalds 2020-02-16
14 Joe McDonalds 2020-02-17
15 Joe KFC 2020-02-18
16 Joe KFC 2020-02-19
17 Joe KFC 2020-02-20
18 Joe MCDonalds 2020-02-21

我想获得每家商店的平均交易频率。

例如,乔在 2 月 15 日至 2 月 21 日期间去了 4 次麦当劳。他的第一次交易和最后一次交易之间相隔 6 天。所以他每 1.5 天就会去一次麦当劳。

我想用这个信息创建一个新的数据框。所以我试试这个:

df.groupby(['customer_id','shop'])['date_of_transaction'].apply(lambda  x: (max(x) - min (x))/len(x))

customer_id shop
Joe KFC 1 days 12:00:00
McDonalds 1 days 12:00:00
John KFC 0 days 16:00:00
McDonalds 0 days 16:00:00
Mary KFC 0 days 12:00:00
McDonalds 0 days 16:00:00

Joe 在这里麦当劳的平均频率是 1 天。应该是 1.5 天。

如果我删除除法,我们得到:

df.groupby(['customer_id','shop'])['date_of_transaction'].apply(lambda  x:(max(x) - min (x)))

customer_id shop
Joe KFC 6 days
McDonalds 6 days
John KFC 2 days
McDonalds 2 days
Mary KFC 1 days
McDonalds 2 days

只是当我试图用它除以每个人在每个商店的访问次数时,它不起作用。

我尝试将 astype(int) 添加到 (max(x) - min (x)) 但它不起作用。我知道这是 timedelta 对象的问题,但我无法将其转换为 int。我还添加了 .dt.days 到 timedelta 对象,但没有成功。

理想情况下,我想以这样的数据框结束(注意 - 频率数字是虚构的):

  customer_id  McDonalds Frequency  KFC Frequency
0 John 1 2
1 Mary 3 4
2 Joe 5 6

我的练习 df。如果加载 df,则可以使用 dayfirst 转换日期:

df['date_of_transaction'] = pd.to_datetime(df['date_of_transaction'],dayfirst=True)


df.to_dict()

{'customer_id': {0: 'John', 1: 'John', 2: 'John', 3: 'John', 4: 'John', 5: 'John', 6: 'Mary', 7: 'Mary', 8: 'Mary', 9: 'Mary', 10: 'Mary', 11: 'Joe', 12: 'Joe', 13: 'Joe', 14: 'Joe', 15: 'Joe', 16: 'Joe', 17: 'Joe', 18: 'Joe'}, 'shop': {0: 'McDonalds', 1: 'McDonalds', 2: 'McDonalds', 3: 'KFC', 4: 'KFC', 5: 'KFC', 6: 'McDonalds', 7: 'McDonalds', 8: 'McDonalds', 9: 'KFC', 10: 'KFC', 11: 'KFC', 12: 'McDonalds', 13: 'McDonalds', 14: 'McDonalds', 15: 'KFC', 16: 'KFC', 17: 'KFC', 18: 'McDonalds'}, 'date_of_transaction': {0: Timestamp('2020-02-03 00:00:00'), 1: Timestamp('2020-02-04 00:00:00'), 2: Timestamp('2020-02-05 00:00:00'), 3: Timestamp('2020-02-06 00:00:00'), 4: Timestamp('2020-02-07 00:00:00'), 5: Timestamp('2020-02-08 00:00:00'), 6: Timestamp('2020-02-09 00:00:00'), 7: Timestamp('2020-02-10 00:00:00'), 8: Timestamp('2020-02-11 00:00:00'), 9: Timestamp('2020-02-12 00:00:00'), 10: Timestamp('2020-02-13 00:00:00'), 11: Timestamp('2020-02-14 00:00:00'), 12: Timestamp('2020-02-15 00:00:00'), 13: Timestamp('2020-02-16 00:00:00'), 14: Timestamp('2020-02-17 00:00:00'), 15: Timestamp('2020-02-18 00:00:00'), 16: Timestamp('2020-02-19 00:00:00'), 17: Timestamp('2020-02-20 00:00:00'), 18: Timestamp('2020-02-21 00:00:00')}}

最佳答案

这里的问题是您将返回值视为一个日期,而实际上它是一个时间增量,1 天 12:00:001.5 天一天 12 小时过去了。

让我们使用 aggregated groupby 稍微 reshape 您的数据,然后我们可以使用 np.timedelta64 编辑您的时间增量:

df1 = df.groupby(['customer_id','shop']).agg(mind=('date_of_transaction','min'),
maxd=('date_of_transaction','max'),
no_visits=('customer_id','count')).reset_index(0)


print(df1)

mind maxd no_visits
customer_id shop
Joe KFC 2020-02-20 2020-02-14 4
McDonalds 2020-02-21 2020-02-15 4
John KFC 2020-02-08 2020-02-06 3
McDonalds 2020-02-05 2020-02-03 3
Mary KFC 2020-02-13 2020-02-12 2
McDonalds 2020-02-11 2020-02-09 3

df1['timedelta'] = ((df1['maxd'] - df1['mind']) / df1['no_visits']) / np.timedelta64(1,'D')

mind maxd no_visits timedelta
customer_id shop
Joe KFC 2020-02-20 2020-02-14 4 1.500000
McDonalds 2020-02-21 2020-02-15 4 1.500000
John KFC 2020-02-08 2020-02-06 3 0.666667
McDonalds 2020-02-05 2020-02-03 3 0.666667
Mary KFC 2020-02-13 2020-02-12 2 0.500000
McDonalds 2020-02-11 2020-02-09 3 0.666667

然后我们使用交叉表:

df2 = (
pd.crosstab(df1["customer_id"], df1.index, df1["timedelta"], aggfunc="first")
.add_suffix("_visits")
.reset_index(0)
)

print(df2)

col_0 customer_id KFC_visits McDonalds_visits
0 Joe 1.500000 1.500000
1 John 0.666667 0.666667
2 Mary 0.500000 0.666667

或我们常驻大师的精彩台词 Scott Boston

df.groupby(["customer_id", "shop"])["date_of_transaction"].agg(
lambda x: (np.ptp(x) / np.timedelta64(1, "D")) / x.count()
).unstack(1).add_suffix('_visits')


shop KFC_visits McDonalds_visits
customer_id
Joe 1.500000 1.500000
John 0.666667 0.666667
Mary 0.500000 0.666667

关于python - Groupby 客户和商店 - 获得平均交易频率。日期问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64123587/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com