% -6ren">
gpt4 book ai didi

r - 使用模糊合并/sqldf 合并两个数据框

转载 作者:行者123 更新时间:2023-12-04 17:09:08 28 4
gpt4 key购买 nike

我有以下数据框(df11 和 df22)我想使用“UserID=UserID”和日期差 <= 30 进行合并/完全连接。因此,如果 UserID 匹配且日期小于或等于 30,我希望它们合并为一个单独的行。我看过模糊连接 here和 sqldf here但我不知道如何为我的数据框实现其中任何一个。

df1 <- structure(list(UserID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L), 
Full.Name = c( "John Smith", "Jack Peters", "Bob Brown", "Jane Doe", "Jackie Jane", "Sarah Brown", "Chloe Brown", "John Smith" ),
Info = c("yes", "no", "yes", "yes", "yes", "yes", "no", "yes"),
EncounterID = c(13L, 14L, 15L, 16L, 17L, 18L, 19L, 13L), DateTime = c("1/2/21 00:00", "1/5/21 12:00", "1/1/21 1:31", "1/5/21 3:34", "5/9/21 5:33", "5/8/21 3:39", "12/12/21 2:30", "12/11/21 9:21"),
Temp = c("100", "103", "104", "103", "101", "102", "103", "105"),

misc = c("(null)", "no", "(null)", "(null)", "(null)","(null)", "(null)", "(null)"
)),
class = "data.frame", row.names = c(NA,
-8L))

df2 <- structure(list(UserID = c(1L, 2L, 3L, 4L, 5L, 6L),
Full.Name = c("John Smith", "Jack Peters", "Bob Brown", "Jane Doe", "Jackie Jane", "Sarah Brown"),
DOB = c("1/1/90", "1/10/90", "1/2/90", "2/20/80", "2/2/80", "12/2/80"),
EncounterID = c(13L, 14L, 15L, 16L, 17L, 18L), EncounterDate = c("1/1/21", "1/2/21", "1/1/21", "1/6/21", "5/7/21", "5/8/21"),
Type = c("Intro", "Intro", "Intro", "Intro", "Care", "Out"),
responses = c("(null)", "no",
"yes", "no", "no", "unsat")),

class = "data.frame", row.names = c(NA,
-6L))
loadedNamespaces()
install.packages("Rcpp")
library(dplyr)
library(tidyr)
install.packages("lubridate")
library(lubridate)

df11 <-
df1 %>%
separate(DateTime, c("Date", "Time"), sep=" ") %>%
mutate(Date = as_datetime(mdy(Date))) %>%
select(-Time) %>%
as_tibble()

df22 <-
df2 %>%
mutate(across(c(EncounterDate), mdy)) %>%
mutate(across(c(EncounterDate), as_datetime)) %>%
as_tibble()

@r2evans 运行第一组代码后,我得到以下输出。这与你的略有不同。

df11 <- mutate(df11, Date_m30 = Date %m-% days(30), Date_p30 = Date %m+% days(30))
df11
# A tibble: 8 x 7
UserID Full.Name Info EncounterID Date Temp misc
<int> <chr> <chr> <int> <dttm> <chr> <chr>
1 1 John Smith yes 13 2021-01-02 00:00:00 100 (null)
2 2 Jack Peters no 14 2021-01-05 00:00:00 103 no
3 3 Bob Brown yes 15 2021-01-01 00:00:00 104 (null)
4 4 Jane Doe yes 16 2021-01-05 00:00:00 103 (null)
5 5 Jackie Jane yes 17 2021-05-09 00:00:00 101 (null)
6 6 Sarah Brown yes 18 2021-05-08 00:00:00 102 (null)
7 7 Chloe Brown no 19 2021-12-12 00:00:00 103 (null)
8 1 John Smith yes 13 2021-12-11 00:00:00 105 (null)

最佳答案

一种方法是首先在其中一个列中创建“+/- 30 天”列,然后进行标准的日期范围连接。使用 sqldf:

准备:

library(dplyr)
df11 <- mutate(df11, Date_m30 = Date %m-% days(30), Date_p30 = Date %m+% days(30))
df11
# # A tibble: 8 x 9
# UserID Full.Name Info EncounterID Date Temp misc Date_m30 Date_p30
# <int> <chr> <chr> <int> <dttm> <chr> <chr> <dttm> <dttm>
# 1 1 John Smith yes 13 2021-01-02 00:00:00 100 (null) 2020-12-03 00:00:00 2021-02-01 00:00:00
# 2 2 Jack Peters no 14 2021-01-05 00:00:00 103 no 2020-12-06 00:00:00 2021-02-04 00:00:00
# 3 3 Bob Brown yes 15 2021-01-01 00:00:00 104 (null) 2020-12-02 00:00:00 2021-01-31 00:00:00
# 4 4 Jane Doe yes 16 2021-01-05 00:00:00 103 (null) 2020-12-06 00:00:00 2021-02-04 00:00:00
# 5 5 Jackie Jane yes 17 2021-05-09 00:00:00 101 (null) 2021-04-09 00:00:00 2021-06-08 00:00:00
# 6 6 Sarah Brown yes 18 2021-05-08 00:00:00 102 (null) 2021-04-08 00:00:00 2021-06-07 00:00:00
# 7 7 Chloe Brown no 19 2021-12-12 00:00:00 103 (null) 2021-11-12 00:00:00 2022-01-11 00:00:00
# 8 1 John Smith yes 13 2021-12-11 00:00:00 105 (null) 2021-11-11 00:00:00 2022-01-10 00:00:00

连接:

sqldf::sqldf("
select df11.*, df22.DOB, df22.EncounterDate, df22.Type, df22.responses
from df11
left join df22 on df11.UserID = df22.UserID
and df22.EncounterDate between df11.Date_m30 and df11.Date_p30") %>%
select(-Date_m30, -Date_p30)
# UserID Full.Name Info EncounterID Date Temp misc DOB EncounterDate Type responses
# 1 1 John Smith yes 13 2021-01-01 19:00:00 100 (null) 1/1/90 2020-12-31 19:00:00 Intro (null)
# 2 2 Jack Peters no 14 2021-01-04 19:00:00 103 no 1/10/90 2021-01-01 19:00:00 Intro no
# 3 3 Bob Brown yes 15 2020-12-31 19:00:00 104 (null) 1/2/90 2020-12-31 19:00:00 Intro yes
# 4 4 Jane Doe yes 16 2021-01-04 19:00:00 103 (null) 2/20/80 2021-01-05 19:00:00 Intro no
# 5 5 Jackie Jane yes 17 2021-05-08 20:00:00 101 (null) 2/2/80 2021-05-06 20:00:00 Care no
# 6 6 Sarah Brown yes 18 2021-05-07 20:00:00 102 (null) 12/2/80 2021-05-07 20:00:00 Out unsat
# 7 7 Chloe Brown no 19 2021-12-11 19:00:00 103 (null) <NA> <NA> <NA> <NA>
# 8 1 John Smith yes 13 2021-12-10 19:00:00 105 (null) <NA> <NA> <NA> <NA>

关于r - 使用模糊合并/sqldf 合并两个数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69856811/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com