gpt4 book ai didi

R函数评估变量表达式并创建新变量填充逻辑

转载 作者:行者123 更新时间:2023-12-04 10:40:51 27 4
gpt4 key购买 nike

df1(下图)是一个事件日志。变量 1 由(非唯一)时间戳 (POSIXCt) 组成。变量 2:4 由事件的属性(因素)组成。

我创建了 df2 和 df3 来定义时间段。 df2 存储每个时间 bin 的初始时间,df3 存储结束时间。

问题是如何使用 df2 的变量名(与 df3 相同)扩展 df1,同时根据事件是否属于该变量的时间仓之一为每个事件填充 TRUE 或 FALSE。< br/>换句话说,如果事件属于时间仓(由 df2 和 df3 定义),则值为 TRUE,否则为 FALSE。df1 中的每个事件都需要根据所有时间箱(df2 和 3 的所有元素对)一次检查一个变量(df2 和 3)。

由于变量和事件过多,我无法以交互方式执行此操作。但想学习如何以 R 方式做到这一点,避免显式 for 循环,并利用矢量化。

数据(小样本数据集)

df1 <- data.frame(time.stamp = c("2015-01-05 15:00:00", "2015-01-05 15:01:00", "2015-01-05 15:02:00", "2015-01-05 15:02:00", "2015-01-05 15:03:00", "2015-01-05 15:03:00", "2015-01-05 15:03:00", "2015-01-05 15:03:00"),
g.id = as.factor(c("848", "737", "848", "848", "737", "848", "737", "737"))
)
df1$time.stamp <- as.POSIXct(strptime(df1$time.stamp, "%Y-%m-%d %H:%M:%S"))

df2 <- data.frame(m0p1 = c("2015-01-05 15:00:00", "2015-01-05 16:00:00", "2015-01-05 17:00:00"),
m1p1 = c("2015-01-05 15:01:00", "2015-01-05 16:01:00", "2015-01-05 17:01:00"),
m2p1 = c("2015-01-05 15:02:00", "2015-01-05 16:02:00", "2015-01-05 17:02:00"),
m3p1 = c("2015-01-05 15:03:00", "2015-01-05 16:03:00", "2015-01-05 17:03:00")
)
df2$m0p1 <- as.POSIXct(strptime(df2$m0p1, "%Y-%m-%d %H:%M:%S"))
df2$m1p1 <- as.POSIXct(strptime(df2$m1p1, "%Y-%m-%d %H:%M:%S"))
df2$m2p1 <- as.POSIXct(strptime(df2$m2p1, "%Y-%m-%d %H:%M:%S"))
df2$m3p1 <- as.POSIXct(strptime(df2$m3p1, "%Y-%m-%d %H:%M:%S"))

df3 <- data.frame(m0p1 = c("2015-01-05 15:01:00", "2015-01-05 16:01:00", "2015-01-05 17:01:00"),
m1p1 = c("2015-01-05 15:02:00", "2015-01-05 16:02:00", "2015-01-05 17:02:00"),
m2p1 = c("2015-01-05 15:03:00", "2015-01-05 16:03:00", "2015-01-05 17:03:00"),
m3p1 = c("2015-01-05 15:04:00", "2015-01-05 16:04:00", "2015-01-05 17:04:00")
)
df3$m0p1 <- as.POSIXct(strptime(df3$m0p1, "%Y-%m-%d %H:%M:%S"))
df3$m1p1 <- as.POSIXct(strptime(df3$m1p1, "%Y-%m-%d %H:%M:%S"))
df3$m2p1 <- as.POSIXct(strptime(df3$m2p1, "%Y-%m-%d %H:%M:%S"))
df3$m3p1 <- as.POSIXct(strptime(df3$m3p1, "%Y-%m-%d %H:%M:%S"))

结果结果将是这样的:

> head(df1.extended)
time.stamp g.id m0p1 m1p1 m2p1 m3p1
1 2015-01-05 15:00:00 848 TRUE FALSE FALSE FALSE
2 2015-01-05 15:01:00 737 FALSE TRUE FALSE FALSE
3 2015-01-05 15:02:00 848 FALSE FALSE TRUE FALSE
4 2015-01-05 15:02:00 848 FALSE FALSE TRUE FALSE
5 2015-01-05 15:03:00 737 FALSE FALSE FALSE TRUE
6 2015-01-05 15:03:00 848 FALSE FALSE FALSE TRUE
7 2015-01-05 15:03:00 737 FALSE FALSE FALSE TRUE
8 2015-01-05 15:03:00 848 FALSE FALSE FALSE TRUE

非常感谢任何指点。谢谢

最佳答案

您可以使用包 data.table 中的 foverlaps:

library(reshape2)
df2 <- melt(df2, value.name = "start")
df3 <- melt(df3, value.name = "end")
df2$end <- df3$end

library(data.table)
setDT(df1)
setDT(df2)

df1[, time.stamp2 := time.stamp]

setkey(df2, start, end)
res <- df2[, foverlaps(df1, .SD,
by.x = c("time.stamp", "time.stamp2"),
by.y = c("start", "end"),
type = "start")[,list(time.stamp, g.id, match = !is.na(start))],
by = variable]
res[, id := seq_len(.N), by = variable]

dcast(res, id + time.stamp + g.id ~ variable, value.var = "match")
# id time.stamp g.id m0p1 m1p1 m2p1 m3p1
# 1 1 2015-01-05 15:00:00 848 TRUE FALSE FALSE FALSE
# 2 2 2015-01-05 15:01:00 737 FALSE TRUE FALSE FALSE
# 3 3 2015-01-05 15:02:00 848 FALSE FALSE TRUE FALSE
# 4 4 2015-01-05 15:02:00 848 FALSE FALSE TRUE FALSE
# 5 5 2015-01-05 15:03:00 737 FALSE FALSE FALSE TRUE
# 6 6 2015-01-05 15:03:00 848 FALSE FALSE FALSE TRUE
# 7 7 2015-01-05 15:03:00 737 FALSE FALSE FALSE TRUE
# 8 8 2015-01-05 15:03:00 737 FALSE FALSE FALSE TRUE

关于R函数评估变量表达式并创建新变量填充逻辑,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29317657/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com