gpt4 book ai didi

r - 在 df2 的日期时间中使用 df1 的 "hour"和 "min"上的条件合并 2 个数据帧

转载 作者:行者123 更新时间:2023-12-04 12:12:08 25 4
gpt4 key购买 nike

我有一个这样的数据框df.sample

id <- c("A","A","A","A","A","A","A","A","A","A","A")
date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12",
"2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14",
"2018-11-12")
hour <- c(8,8,9,9,13,13,16,6,7,19,7)
min <- c(47,59,6,18,22,36,12,32,12,21,47)
value <- c(70,70,86,86,86,74,81,77,79,83,91)
df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F)
df.sample$date <- as.Date(df.sample$date,format="%Y-%m-%d")

我有另一个像这样的数据框df.state

id <- c("A","A","A")
starttime <- c("2018-11-12 08:59:00","2018-11-14 06:24:17","2018-11-15 09:17:00")
endtime <- c("2018-11-12 15:57:00","2018-11-14 17:22:16","2018-11-15 12:17:32")
state <- c("Pass","Pass","Pass")

df.state <- data.frame(id,starttime,endtime,state,stringsAsFactors = F)
df.state$starttime <- as.POSIXct(df.state$starttime,format="%Y-%m-%d %H:%M:%S")
df.state$endtime <- as.POSIXct(df.state$endtime,format="%Y-%m-%d %H:%M:%S")

我正在尝试根据条件合并这两个数据框

如果 df.sample 中的 hourminstarttimeendtimedf.state,然后将state = Pass合并到df.sample中。

例如,df.sample 中的第 2 行有 hour = 8min = 59,因为它在 starttime = 2018-11-12 08:59:00df.state中,添加值Pass

这是我期望的输出

   id       date hour min value state
A 2018-11-12 8 47 70
A 2018-11-12 8 59 70 Pass
A 2018-11-12 9 6 86 Pass
A 2018-11-12 9 18 86 Pass
A 2018-11-12 13 22 86 Pass
A 2018-11-12 13 36 74 Pass
A 2018-11-12 16 12 81
A 2018-11-14 6 32 77 Pass
A 2018-11-14 7 12 79 Pass
A 2018-11-14 19 21 83
A 2018-11-12 7 47 91

我能够像这样合并这两个数据帧,但无法在 df.state 的开始时间和结束时间中查找 df.sample 的小时和分钟

library(tidyverse)
df.sample <- df.sample %>%
left_join(df.state)

谁能给我指出正确的方向

最佳答案

如果您碰巧有大数据帧,使用 data.table 包中的非相等连接会更快更容易: Benchmark | Video

library(data.table)

## convert both data.frames to data.tables by reference
setDT(df.sample)
setDT(df.state)

## create a `time` column in df.sample
df.sample[, time := as.POSIXct(paste0(date, " ", hour, ":", min, ":00"))]
## change column order
setcolorder(df.sample, c("id", "time"))

# join by id and time within start & end time limits
# "x." is used so we can refer to the column in other data.table explicitly
df.state[df.sample, .(id, time, date, hour, min, value, state = x.state),
on = .(id, starttime <= time, endtime >= time)]
#> id time date hour min value state
#> 1: A 2018-11-12 08:47:00 2018-11-12 8 47 70 <NA>
#> 2: A 2018-11-12 08:59:00 2018-11-12 8 59 70 Pass
#> 3: A 2018-11-12 09:06:00 2018-11-12 9 6 86 Pass
#> 4: A 2018-11-12 09:18:00 2018-11-12 9 18 86 Pass
#> 5: A 2018-11-12 13:22:00 2018-11-12 13 22 86 Pass
#> 6: A 2018-11-12 13:36:00 2018-11-12 13 36 74 Pass
#> 7: A 2018-11-12 16:12:00 2018-11-12 16 12 81 <NA>
#> 8: A 2018-11-14 06:32:00 2018-11-14 6 32 77 Pass
#> 9: A 2018-11-14 07:12:00 2018-11-14 7 12 79 Pass
#> 10: A 2018-11-14 19:21:00 2018-11-14 19 21 83 <NA>
#> 11: A 2018-11-12 07:47:00 2018-11-12 7 47 91 <NA>

### remove NA
df.state[df.sample, .(id, time, date, hour, min, value, state = x.state),
on = .(id, starttime <= time, endtime >= time), nomatch = 0L]
#> id time date hour min value state
#> 1: A 2018-11-12 08:59:00 2018-11-12 8 59 70 Pass
#> 2: A 2018-11-12 09:06:00 2018-11-12 9 6 86 Pass
#> 3: A 2018-11-12 09:18:00 2018-11-12 9 18 86 Pass
#> 4: A 2018-11-12 13:22:00 2018-11-12 13 22 86 Pass
#> 5: A 2018-11-12 13:36:00 2018-11-12 13 36 74 Pass
#> 6: A 2018-11-14 06:32:00 2018-11-14 6 32 77 Pass
#> 7: A 2018-11-14 07:12:00 2018-11-14 7 12 79 Pass

reprex package 创建于 2019-05-23 (v0.3.0)

关于r - 在 df2 的日期时间中使用 df1 的 "hour"和 "min"上的条件合并 2 个数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56281178/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com