gpt4 book ai didi

python - 从 Pandas 中每个组中的第一个事件开始计算第 n 天

转载 作者:行者123 更新时间:2023-11-28 19:25:59 24 4
gpt4 key购买 nike

这是来自 my other question 的后续问题:

我有以下数据框,从我的原始数据框中提取子集,包含列 obeventunixtime day,我想添加另一列 arbday,这是自第一个事件(第一次访问是第 1 天)按 ob 分组以来的第 n 天:

import numpy as np  
import datetime as dt

>>> newdf = pd.DataFrame({'ob': ['a','a','b','b','c', 'd', 'e', 'e', 'e', 'f', 'f', 'f'],'event': [1, 2, 1, 2, 1, 1, 1, 2, 3, 1, 2, 3], 'unixtime': [1346682124716, 1346682188598, 1346745432765, 1347080641650, 1346676710509, 1346702995184, 1346530405978, 1346530421609, 1346530570952, 1346617885925, 1346961625305,1347214217566]},index=[343340, 343341, 343342, 343343, 343344, 343345, 343349, 343350, 343351, 343352,343353,343354])
>>> newdf['day'] = newdf['unixtime'].apply(lambda x: dt.datetime.utcfromtimestamp(x/1000).date())

ob event unixtime day arbday
343340 a 1 1346682124716 2012-09-03 1
343341 a 2 1346682188598 2012-09-03 1
343342 b 1 1346745432765 2012-09-04 1
343343 b 2 1347080641650 2012-09-08 5
343344 c 1 1346676710509 2012-09-03 1
343345 d 1 1346702995184 2012-09-03 1
343349 e 1 1346530405978 2012-09-01 1
343350 e 2 1346530421609 2012-09-01 1
343351 e 3 1346530570952 2012-09-01 1
343352 f 1 1346617885925 2012-09-02 1
343353 f 2 1346961625305 2012-09-06 5
343354 f 3 1347214217566 2012-09-09 8

在一个 ob 内,这将起作用:

newdf['arbday'] = newdf['day'].map(lambda x: (x-testdf.get_value(newdf[newdf.event == 1].first_valid_index(), 'day')).days+1)

newdf['arbday'] = newdf['day'].map(lambda x: (x-newdf.get_value(int(newdf[newdf.event == 1].index), 'day')).days+1)

我尝试了下面的代码,它成功了:

>>> newdf['arbday'] = newdf.groupby('ob')['day'].transform(lambda x: (x-x.min()).apply(lambda y: y.days)+1)

event ob unixtime day arbday
343340 1 a 1346682124716 2012-09-03 1
343341 2 a 1346682188598 2012-09-03 1
343342 1 b 1346745432765 2012-09-04 1
343343 2 b 1347080641650 2012-09-08 5
343344 1 c 1346676710509 2012-09-03 1
343345 1 d 1346702995184 2012-09-03 1
343349 1 e 1346530405978 2012-09-01 1
343350 2 e 1346530421609 2012-09-01 1
343351 3 e 1346530570952 2012-09-01 1
343352 1 f 1346617885925 2012-09-02 1
343353 2 f 1346961625305 2012-09-06 5
343354 3 f 1347214217566 2012-09-09 8

但这显然不是最优雅的方式。另外,为什么 eventob 的顺序改变了?

任何指针将不胜感激。谢谢!

最佳答案

In [46]: firstdays = df.groupby('ob').day.first()

In [47]: firstdays
Out[47]:
ob
a 2012-09-03
b 2012-09-04
c 2012-09-03
d 2012-09-03
e 2012-09-01
f 2012-09-02
Name: day

In [48]: df.apply(lambda row: (row['day'] - firstdays[row['ob']]).days + 1, axis=1)
Out[48]:
343340 1
343341 1
343342 1
343343 5
343344 1
343345 1
343349 1
343350 1
343351 1
343352 1
343353 5
343354 8

关于python - 从 Pandas 中每个组中的第一个事件开始计算第 n 天,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13175251/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com