gpt4 book ai didi

r - 在 data.table 中向量化 R for 循环

转载 作者:行者123 更新时间:2023-12-01 04:29:38 25 4
gpt4 key购买 nike

我正在用 R 构建一个维护程序员。对于不同的机器,我有特定事件的例程,这些事件应该在特定日期执行,由频率和开始日期定义。

我已经有 data.table与频率(以周为单位),大型维护的最后已知日期和每个例程的预计日期,根据其频率和最后日期。精简版如下所示:

require(data.table)

dt <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9), machine = c("t1",
"t1", "t1", "t1", "t1", "t2", "t2", "t2", "t2"), frequencyWeeks = c(4,
12, 24, 48, 96, 4, 24, 48, 96), lastMaintenance = structure(c(17889,
17889, 17889, 17889, 17889, 17871, 17871, 17871, 17871), class = "Date"),
datesRoutines = list(structure(c(17889, 17917, 17945, 17973,
18001, 18029, 18057, 18085, 18113, 18141, 18169, 18197, 18225,
18253, 18281, 18309, 18337, 18365, 18393, 18421, 18449, 18477,
18505, 18533, 18561, 18589, 18617), class = "Date"), structure(c(17889,
17973, 18057, 18141, 18225, 18309, 18393, 18477, 18561), class = "Date"),
structure(c(17889, 18057, 18225, 18393, 18561), class = "Date"),
structure(c(17889, 18225, 18561), class = "Date"), structure(c(17889,
18561), class = "Date"), structure(c(17871, 17899, 17927,
17955, 17983, 18011, 18039, 18067, 18095, 18123, 18151,
18179, 18207, 18235, 18263, 18291, 18319, 18347, 18375,
18403, 18431, 18459, 18487, 18515, 18543, 18571, 18599,
18627), class = "Date"), structure(c(17871, 18039, 18207,
18375, 18543), class = "Date"), structure(c(17871, 18207,
18543), class = "Date"), structure(c(17871, 18543), class = "Date"))), class = c("data.table",
"data.frame"), row.names = c(NA, -9L))

DT
   id machine frequencyWeeks lastMaintenance                                                         datesRoutines
1: 1 t1 4 2018-12-24 2018-12-24,2019-01-21,2019-02-18,2019-03-18,2019-04-15,2019-05-13,...
2: 2 t1 12 2018-12-24 2018-12-24,2019-03-18,2019-06-10,2019-09-02,2019-11-25,2020-02-17,...
3: 3 t1 24 2018-12-24 2018-12-24,2019-06-10,2019-11-25,2020-05-11,2020-10-26
4: 4 t1 48 2018-12-24 2018-12-24,2019-11-25,2020-10-26
5: 5 t1 96 2018-12-24 2018-12-24,2020-10-26
6: 6 t2 4 2018-12-06 2018-12-06,2019-01-03,2019-01-31,2019-02-28,2019-03-28,2019-04-25,...
7: 7 t2 24 2018-12-06 2018-12-06,2019-05-23,2019-11-07,2020-04-23,2020-10-08
8: 8 t2 48 2018-12-06 2018-12-06,2019-11-07,2020-10-08
9: 9 t2 96 2018-12-06 2018-12-06,2020-10-08

需要 : 我想为每台机器和干预日期建立最高id的例程是什么(例程按照复杂性递增的顺序记录,这意味着它将是最复杂的)。

到目前为止我尝试了什么 :我使用嵌套的 for 循环来实现它:
for (j in dt[, unique(machine)]){
for (i in dt[machine == j, ][1, datesRoutines[[1]]]){
result[count, "machine"] <- j
result[count, "date"] <- as.Date(i, origin = origin)
result[count, "rutina"] <- dt[machine == j, i %in% datesRoutines[[1]], by = id][V1 == TRUE, max(id)]
count <- count + 1
}
}

setDT(result)

预期结果 : 我期待一个 data.table带有机器、日期和例程 ID:
head(result)
machine date rutina
1 t1 2018-12-24 5
2 t1 2019-01-21 1
3 t1 2019-02-18 1
4 t1 2019-03-18 2
5 t1 2019-04-15 1
6 t1 2019-05-13 1

问题 : 可以矢量化吗?执行此操作的代码是什么?

最佳答案

这是我能想到的最好的简单化:

   results <- list()
for(m in unique(dt$machine)){
dates <- dt[machine==m]$datesRoutines
dates <- as.Date(unique(unlist(dates)), origin="1970-01-01")
result <- data.table(date=dates)
result[, machine:=m]
for(d in dates){
result[date==d, routine:=dt[as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines),
.(id, ord=as.double(max(which(as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines))))),
by=seq_len(nrow(dt))][,.(ord==max(ord), id)][V1==T][, max(id)]]

}
results[[m]] <- result

}
final_result <- rbindlist(results)

在这里,您可以更进一步:
results <- list()
for(m in unique(dt$machine)){
dates <- dt[machine==m]$datesRoutines
dates <- as.Date(unique(unlist(dates)), origin="1970-01-01")
result <- data.table(date=dates)
result[, machine:=m]
result$routine <-lapply(result$date, function(d){
dt[as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines),
.(id, ord=as.double(max(which(as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines))))),
by=seq_len(nrow(dt))][,.(ord==max(ord), id)][V1==T][, max(id)]})
results[[m]] <- result

}
final_result <- rbindlist(results)

最后,对于 for loop 的仇恨者:
results <- lapply(unique(dt$machine), function(x){
dates <- dt[machine==x]$datesRoutines
dates <- as.Date(unique(unlist(dates)), origin="1970-01-01")
result <- data.table(date=dates)
result[, machine:=x]
})

tmp_result<-lapply(results, function(r){
r$routine <-lapply(r$date, function(d){
dt[as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines),
.(id, ord=as.double(max(which(as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines))))),
by=seq_len(nrow(dt))][,.(ord==max(ord), id)][V1==T][, max(id)]})
})

final_results <- rbindlist(results)
final_results$rutina <- unlist(tmp_result)

关于r - 在 data.table 中向量化 R for 循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55205037/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com