gpt4 book ai didi

r - 使用 data.table 更新按行相互依赖的两列

转载 作者:行者123 更新时间:2023-12-01 13:27:41 25 4
gpt4 key购买 nike

我想创建一个包含公交车站之间出发和到达时间的 data.table。这是我的 data.table 的格式. (下面的可重现数据集)

    trip_id stop_sequence arrival_time departure_time travel_time
1: a 1 07:00:00 07:00:00 00:00:00
2: a 2 00:00:00 00:00:00 00:02:41
3: a 3 00:00:00 00:00:00 00:01:36
4: a 4 00:00:00 00:00:00 00:02:39
5: a 5 00:00:00 00:00:00 00:02:28
6: b 1 07:00:00 07:00:00 00:00:00
7: b 2 00:00:00 00:00:00 00:00:00
8: b 3 00:00:00 00:00:00 00:01:36
9: b 4 00:00:00 00:00:00 00:00:37
10: b 5 00:00:00 00:00:00 00:03:00

这是它应该如何工作的。这个想法是车辆按照停止顺序行驶。行程中a ,例如,它需要 00:02:41用于车辆从停止处行驶1停止2 .给定乘客在每个站点进出车辆的固定时间 40 秒,公共(public)汽车将从站点出发 2"07:03:21"

这里的事情是,这是两列之间的逐行迭代过程。直觉上,我会 for set loop in data.table但我无法理解这个。帮忙?

可重现的数据集:

library(data.table)
library(chron)

dt <- structure(list(trip_id = c("a", "a", "a", "a", "a", "b", "b",
"b", "b", "b"), stop_sequence = c(1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L), arrival_time = structure(c(0.291666666666667, 0,
0, 0, 0, 0.291666666666667, 0, 0, 0, 0), format = "h:m:s", class = "times"),
departure_time = structure(c(0.291666666666667, 0, 0, 0,
0, 0.291666666666667, 0, 0, 0, 0), format = "h:m:s", class = "times"),
travel_time = structure(c(0, 0.00186598685444013, 0.00110857958406301,
0.00183749407361369, 0.00171664297781446, 0, 0.000522388450578203,
0.00111473367541453, 0.000427755975518318, 0.00207918951573377
), format = "h:m:s", class = "times")), .Names = c("trip_id",
"stop_sequence", "arrival_time", "departure_time", "travel_time"
), class = c("data.table", "data.frame"), row.names = c(NA, -10L
))

预期输出:前四行

   trip_id stop_sequence arrival_time departure_time travel_time
1: a 1 07:00:00 07:00:00 00:00:00
2: a 2 07:02:41 07:03:21 00:02:41
3: a 3 07:04:57 07:05:37 00:01:36
4: a 4 07:08:16 07:08:56 00:02:39

最佳答案

我认为不循环也可以做到。我认为您可以计算 departure_time 而无需循环,然后一旦有了它,arrival_time 就是 departure_time - 40 秒:

dt2 <- copy(dt)
dt2[,c("arrival_time", "departure_time") := .(cumsum(arrival_time + ifelse(travel_time==0, 0, travel_time + times("00:00:40"))) - ifelse(travel_time == 0 , 0, times("00:00:40")),
cumsum(arrival_time + ifelse(travel_time==0, 0, travel_time + times("00:00:40")))),
by = trip_id]

dt2

# trip_id stop_sequence arrival_time departure_time travel_time
#1: a 1 07:00:00 07:00:00 00:00:00
#2: a 2 07:02:41 07:03:21 00:02:41
#3: a 3 07:04:57 07:05:37 00:01:36
#4: a 4 07:08:16 07:08:56 00:02:39
#5: a 5 07:11:24 07:12:04 00:02:28
#6: b 1 07:00:00 07:00:00 00:00:00
#7: b 2 07:00:45 07:01:25 00:00:45
#8: b 3 07:03:01 07:03:41 00:01:36
#9: b 4 07:04:18 07:04:58 00:00:37
#10: b 5 07:07:58 07:08:38 00:03:00

或者,您不必为 departure_time 重复长 cumsum 以获得 arrival_time,您可以这样做:

dt2[,departure_time := cumsum(arrival_time + ifelse(travel_time==0, 0, travel_time + times("00:00:40"))), by = trip_id]
dt2[, arrival_time := departure_time - ifelse(travel_time == 0 , 0, times("00:00:40"))]

@eddi 发布的第三个选项:

dt[, departure_time := arrival_time[1] + cumsum(travel_time) + (0:(.N-1))*times('00:00:40'), by = trip_id]
dt[, arrival_time := c(arrival_time[1], tail(departure_time, -1) - times('00:00:40')), by = trip_id]

关于r - 使用 data.table 更新按行相互依赖的两列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47601241/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com