gpt4 book ai didi

python - Pandas 数据帧 : difference between all dates for each unique id

转载 作者:太空宇宙 更新时间:2023-11-03 21:16:27 25 4
gpt4 key购买 nike

[In 621]: df = pd.DataFrame({'id':[44,44,44,88,88,90,95],
'Status': ['Reject','Submit','Draft','Accept','Submit',
'Submit','Draft'],
'Datetime': ['2018-11-24 08:56:02',
'2018-10-24 18:12:02','2018-10-24 08:12:02',
'2018-10-29 13:17:02','2018-10-24 10:12:02',
'2018-12-30 08:43:12', '2019-01-24 06:12:02']
}, columns = ['id','Status', 'Datetime'])
df['Datetime'] = pd.to_datetime(df['Datetime'])
df

Out[621]:
id Status Datetime
0 44 Reject 2018-11-24 08:56:02
1 44 Submit 2018-10-24 18:12:02
2 44 Draft 2018-10-24 08:12:02
3 88 Accept 2018-10-29 13:17:02
4 88 Submit 2018-10-24 10:12:02
5 90 Submit 2018-12-30 08:43:12
6 95 Draft 2019-01-24 06:12:02

我想要得到的是另一列,例如df['Time in Status'] 这是 id 在该状态下花费的时间。

我查看了 df.groupby() 但只找到了用于计算两个日期(例如第一个和最后一个)之间的答案( such as this one ),无论中间有多少个日期。

df['Datetime'] = pd.to_datetime(df['Datetime'])                              
g = df.groupby('id')['Datetime']
print(df.groupby('id')['Datetime'].apply(lambda g: g.iloc[-1] - g.iloc[0]))

id
44 -32 days +23:16:00
88 -6 days +20:55:00
90 0 days 00:00:00
95 0 days 00:00:00
Name: Datetime, dtype: timedelta64[ns]

我最接近得到的结果是 DataFrameGroupBy.diff

df['Time in Status'] = df.groupby('id')['Datetime'].diff()
df
id Status Datetime Time in Status
0 44 Reject 2018-11-24 08:56:02 NaT
1 44 Submit 2018-10-24 18:12:02 -31 days +09:16:00
2 44 Draft 2018-10-24 08:12:02 -1 days +14:00:00
3 88 Accept 2018-10-29 13:17:02 NaT
4 88 Submit 2018-10-24 10:12:02 -6 days +20:55:00
5 90 Submit 2018-12-30 08:43:12 NaT
6 95 Draft 2019-01-24 06:12:02 NaT

但是这有两个问题。首先,我怎样才能从最早的日期开始一直计算到最后呢?例如。那么在 2 行中,不是 -1 days +14:00:00 而是 0 Days 10:00:00?或者通过预先重新排列数据的顺序来解决这个问题是否更容易?

另一个问题是 NaT。如果没有可比较的日期,则将使用当天(即 datetime.now)。之后我可以很容易地应用这个,但我想知道是否有更好的解决方案来查找和替换所有 NaT 值。

最佳答案

没错,你说得对,首先是必要的排序DataFrame.sort_values两列:

df = df.sort_values(['id', 'Datetime'])
df['Time in Status'] = df.groupby('id')['Datetime'].diff()
print (df)
id Status Datetime Time in Status
2 44 Draft 2018-10-24 08:12:02 NaT
1 44 Submit 2018-10-24 18:12:02 0 days 10:00:00
0 44 Reject 2018-11-24 08:56:02 30 days 14:44:00
4 88 Submit 2018-10-24 10:12:02 NaT
3 88 Accept 2018-10-29 13:17:02 5 days 03:05:00
5 90 Submit 2018-12-30 08:43:12 NaT
6 95 Draft 2019-01-24 06:12:02 NaT

关于python - Pandas 数据帧 : difference between all dates for each unique id,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54669337/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com