gpt4 book ai didi

r - 如何使嵌套 for 循环更高效并与 apply 一起使用

转载 作者:行者123 更新时间:2023-12-04 11:15:46 24 4
gpt4 key购买 nike

我正在尝试将功能嵌套的 for 循环转换为与 apply 一起使用。我希望这将使它更快。 (根据我的阅读,虽然这并不总是正确的)主数据框中有大约 150K 行要循环......非常耗时

我在 R 中编写了一个 for 循环来检查 df1 中的 date.time 是否位于 df2 中的两个 date.times 之间,如果 df1 和 df2 中的代码匹配,则将 df2 中的位置粘贴到 df1 中

下面是子集样本数据

df1<-structure(list(date.time = structure(c(1455922438, 1455922445, 
1455922449, 1455922457, 1455922459, 1455922461), class = c("POSIXct",
"POSIXt"), tzone = ""), code = c(32221, 32222, 32221, 32222,
32222, 32221)), .Names = c("date.time", "code"), row.names = 50000:50005, class = "data.frame")

df2<-structure(list(Location = 11:12, Code = 32221:32222, t_in = structure(c(1455699600,
1455699600), class = c("POSIXct", "POSIXt"), tzone = ""), t_out = structure(c(1456401600,
1456401600), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("Location",
"Code", "t_in", "t_out"), class = "data.frame", row.names = 11:12)

For 循环可以正常工作,但需要很长时间:

for (i in 1:nrow(df1)[1]){
for (j in 1:nrow(df2)){
ifelse(df1$code[i] == df2$Code[j]
& df1$date.time [i] < df2$t_out [j]
& df1$date.time [i] > df2$t_in [j],
df1$Location [i] <- df2$Location [j],
NA)
}
}

我已经将它与此分开:

ids <- as.numeric(df2$Location)
f <- function(x){
a <- ids[ (df2$t_in < x) & (x < df2$t_out) ]
if (length(a) == 0 ) NA else a
}

df1$Location <- lapply(df1$date.time, f)

这将返回两个数字,因为 df1 中的 date.time 位于 t_in 和 t_out 之间,因此为什么粘贴位置时每个数据框中的代码都需要匹配

非常感谢任何指点

最佳答案

data.table 包具有重叠范围连接,可以非常快速地完成此操作。您要查找的函数是 foverlaps。这是一个在使用 foverlaps 之前进行一些清理的示例:

require(data.table)

dt1 <- data.table(df1)
dt2 <- data.table(df2)

## need to create a range in dt 1 to find overlaps on
dt1[,start:=date.time]
dt1[,end:=date.time]

## clean up names to match each other
setnames(dt2,c("Location","Code","start","end"))
setnames(dt1,c("code"),c("Code"))

setkey(dt1,Code,start,end)
setkey(dt2,Code,start,end)

## use foverlaps with the additional matching variable Code
out <- foverlaps(dt1,dt2,type="any",
by.x=c("Code","start","end"),
by.y=c("Code","start","end"))

## more renaming and selection of the same subset of columns
setnames(out,"i.start","date.time")
out <- out[,.(date.time,Code,Location)]

给出输出:

> out
date.time Code Location
1: 2016-02-19 14:53:58 32221 11
2: 2016-02-19 14:54:09 32221 11
3: 2016-02-19 14:54:21 32221 11
4: 2016-02-19 14:54:05 32222 12
5: 2016-02-19 14:54:17 32222 12
6: 2016-02-19 14:54:19 32222 12

关于r - 如何使嵌套 for 循环更高效并与 apply 一起使用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35758997/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com