gpt4 book ai didi

python - 在python中查找日期范围重叠

转载 作者:太空狗 更新时间:2023-10-30 00:43:10 25 4
gpt4 key购买 nike

我正在尝试找到一种更有效的方法来根据特定列 (id) 在数据框中查找重叠数据范围(每行提供的开始/结束日期)。数据框按“来自”列排序。我认为有一种方法可以像我一样避免双重 apply 功能:

import pandas as pd
from datetime import datetime

df = pd.DataFrame(columns=['id','from','to'], index=range(5), \
data=[[878,'2006-01-01','2007-10-01'],
[878,'2007-10-02','2008-12-01'],
[878,'2008-12-02','2010-04-03'],
[879,'2010-04-04','2199-05-11'],
[879,'2016-05-12','2199-12-31']])

df['from'] = pd.to_datetime(df['from'])
df['to'] = pd.to_datetime(df['to'])


id from to
0 878 2006-01-01 2007-10-01
1 878 2007-10-02 2008-12-01
2 878 2008-12-02 2010-04-03
3 879 2010-04-04 2199-05-11
4 879 2016-05-12 2199-12-31

我使用“apply”函数在所有组上循环,在每个组内,我每行使用“apply”:

def check_date_by_id(df):

df['prevFrom'] = df['from'].shift()
df['prevTo'] = df['to'].shift()

def check_date_by_row(x):

if pd.isnull(x.prevFrom) or pd.isnull(x.prevTo):
x['overlap'] = False
return x

latest_start = max(x['from'], x.prevFrom)
earliest_end = min(x['to'], x.prevTo)
x['overlap'] = int((earliest_end - latest_start).days) + 1 > 0
return x

return df.apply(check_date_by_row, axis=1).drop(['prevFrom','prevTo'], axis=1)

df.groupby('id').apply(check_date_by_id)

id from to overlap
0 878 2006-01-01 2007-10-01 False
1 878 2007-10-02 2008-12-01 False
2 878 2008-12-02 2010-04-03 False
3 879 2010-04-04 2199-05-11 False
4 879 2016-05-12 2199-12-31 True

我的代码灵感来自以下链接:

最佳答案

您可以将 移到 列并直接减去日期时间。

df['overlap'] = (df['to'].shift()-df['from']) > timedelta(0)

在按 id 分组时应用它可能看起来像

df['overlap'] = (df.groupby('id')
.apply(lambda x: (x['to'].shift() - x['from']) > timedelta(0))
.reset_index(level=0, drop=True))

演示

>>> df
id from to
0 878 2006-01-01 2007-10-01
1 878 2007-10-02 2008-12-01
2 878 2008-12-02 2010-04-03
3 879 2010-04-04 2199-05-11
4 879 2016-05-12 2199-12-31

>>> df['overlap'] = (df.groupby('id')
.apply(lambda x: (x['to'].shift() - x['from']) > timedelta(0))
.reset_index(level=0, drop=True))

>>> df
id from to overlap
0 878 2006-01-01 2007-10-01 False
1 878 2007-10-02 2008-12-01 False
2 878 2008-12-02 2010-04-03 False
3 879 2010-04-04 2199-05-11 False
4 879 2016-05-12 2199-12-31 True

关于python - 在python中查找日期范围重叠,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42462218/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com