gpt4 book ai didi

python - 计算python中的逐行时间差

转载 作者:行者123 更新时间:2023-12-03 18:29:18 25 4
gpt4 key购买 nike

我想根据他们第一次上车的时刻和他们离开的时刻之间的差异来计算我的数据框中每个乘客的旅行时间。

这是数据框

my_df = pd.DataFrame({
'id': ['a', 'b', 'b', 'b', 'b', 'b', 'c','d'],
'date': ['2020/02/03', '2020/04/05', '2020/04/05', '2020/04/05','2020/04/06', '2020/04/06', '2020/12/15', '2020/06/23'],
'arriving_time': ['14:36:06', '08:52:02', '08:53:02', '08:55:24', '18:58:03', '19:03:05', '17:04:28', '21:31:23'],
'leaving_time': ['14:40:05', '08:52:41', '08:54:33', '08:57:14', '19:01:07', '19:04:08', '17:09:48', '21:50:12']
})
print(my_df)

output:

id date arriving_time leaving_time
0 a 2020/02/03 14:36:06 14:40:05
1 b 2020/04/05 08:52:02 08:52:41
2 b 2020/04/05 08:53:02 08:54:33
3 b 2020/04/05 08:55:24 08:57:14
4 b 2020/04/06 18:58:03 19:01:07
5 b 2020/04/06 19:03:05 19:04:08
6 c 2020/12/15 17:04:28 17:09:48
7 d 2020/06/23 21:31:23 21:50:12

但是有两个问题(我无法自己解决):
  • 乘客是通过手机信号检测到的,但信号往往不稳定,这就是为什么对于同一个人,我们可以有很多行(如上述数据集中的乘客 b)。 “arriving_time”是检测到信号的时间,“leaving_time”是信号丢失的时间
  • 为了计算旅行时间,我需要将每个唯一 ID 和每次旅行的最近到达时间减去最近离开时间。

  • 这是我想要获得的结果
    id  date    arriving_time   leaving_time    travelTime
    0 a 2020/02/03 14:36:06 14:40:05 00:03:59
    1 b 2020/04/05 08:52:02 08:52:41 00:05:12
    2 b 2020/04/05 08:53:02 08:54:33 00:05:12
    3 b 2020/04/05 08:55:24 08:57:14 00:05:12
    4 b 2020/04/06 18:58:03 19:01:07 00:06:05
    5 b 2020/04/06 19:03:05 19:04:08 00:06:05
    6 c 2020/12/15 17:04:28 17:09:48 00:05:20
    7 d 2020/06/23 21:31:23 21:50:12 00:18:49

    如您所见,乘客 b 在同一天进行了两次不同的旅行,我想知道计算每一次旅行的持续时间。

    我已经尝试了下面的代码,它似乎有效,但它真的很慢(我认为这是由于 my_df 的行数很大)
    for user_id in set(my_df.id):
    for day in set(my_df.loc[my_df.id == user_id, 'date']):
    my_df.loc[(my_df.id == user_id) & (my_df.date == day), 'travelTime'] = max(my_df.loc[(my_df.id == user_id) & (my_df.date == day), 'leaving_time'].apply(pd.to_datetime)) - min(my_df.loc[(my_df.id == user_id) & (my_df.date == day), 'arriving_time'].apply(pd.to_datetime))

    最佳答案

    我认为正确的最大值和最小值将列转换为日期时间,然后减去 Series创建者 GroupBy.transform :

    my_df['s'] = pd.to_datetime(my_df['date'] + ' ' + my_df['arriving_time'])
    my_df['e'] = pd.to_datetime(my_df['date'] + ' ' + my_df['leaving_time'])

    g = my_df.groupby(['id', 'date'])
    my_df['travelTime'] = g['e'].transform('max').sub(g['s'].transform('min'))
    print (my_df)
    id date arriving_time leaving_time s \
    0 a 2020/02/03 14:36:06 14:40:05 2020-02-03 14:36:06
    1 b 2020/04/05 08:52:02 08:52:41 2020-04-05 08:52:02
    2 b 2020/04/05 08:53:02 08:54:33 2020-04-05 08:53:02
    3 b 2020/04/05 08:55:24 08:57:14 2020-04-05 08:55:24
    4 b 2020/04/06 18:58:03 19:01:07 2020-04-06 18:58:03
    5 b 2020/04/06 19:03:05 19:04:08 2020-04-06 19:03:05
    6 c 2020/12/15 17:04:28 17:09:48 2020-12-15 17:04:28
    7 d 2020/06/23 21:31:23 21:50:12 2020-06-23 21:31:23

    e travelTime
    0 2020-02-03 14:40:05 00:03:59
    1 2020-04-05 08:52:41 00:05:12
    2 2020-04-05 08:54:33 00:05:12
    3 2020-04-05 08:57:14 00:05:12
    4 2020-04-06 19:01:07 00:06:05
    5 2020-04-06 19:04:08 00:06:05
    6 2020-12-15 17:09:48 00:05:20
    7 2020-06-23 21:50:12 00:18:49

    为了避免新列,可以使用 DataFrame.assign 系列与 datetimes :
    s = pd.to_datetime(my_df['date'] + ' ' + my_df['arriving_time'])
    e = pd.to_datetime(my_df['date'] + ' ' + my_df['leaving_time'])

    g = my_df.assign(s=s, e=e).groupby(['id', 'date'])
    my_df['travelTime'] = g['e'].transform('max').sub(g['s'].transform('min'))
    print (my_df)
    id date arriving_time leaving_time travelTime
    0 a 2020/02/03 14:36:06 14:40:05 00:03:59
    1 b 2020/04/05 08:52:02 08:52:41 00:05:12
    2 b 2020/04/05 08:53:02 08:54:33 00:05:12
    3 b 2020/04/05 08:55:24 08:57:14 00:05:12
    4 b 2020/04/06 18:58:03 19:01:07 00:06:05
    5 b 2020/04/06 19:03:05 19:04:08 00:06:05
    6 c 2020/12/15 17:04:28 17:09:48 00:05:20
    7 d 2020/06/23 21:31:23 21:50:12 00:18:49

    关于python - 计算python中的逐行时间差,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60756121/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com