gpt4 book ai didi

r 按两个日期之间的 id 和日期合并

转载 作者:行者123 更新时间:2023-12-02 20:30:05 25 4
gpt4 key购买 nike

我有dataset1,其中有两列IDApplication_SubscribedDate。 Application_SubmissedDate 列是日期/时间列。

     ID         Application_SubmittedDate
6972 2001-05-30 16:57:00
6972 2003-03-08 12:30:00
6972 2006-03-22 17:43:00
6972 2003-08-07 20:20:00
6972 2006-07-28 18:28:00
6972 2001-05-25 17:14:00
6972 2003-09-30 00:48:00
6972 2002-06-04 18:11:00
6972 2006-05-06 17:30:00
6972 2003-02-24 16:02:00
6972 2006-09-16 16:29:00
6972 2003-02-12 22:47:00
6972 2002-08-15 23:30:00
6972 2002-08-31 22:32:00
40841 2002-09-27 05:39:00
40841 2002-01-08 09:05:00
40841 2002-10-07 21:04:00
40841 2002-08-17 18:50:00
59547 2003-08-12 10:45:00
59547 2001-02-20 17:02:00
59547 2002-11-05 23:01:00
60861 2003-10-27 14:40:00
63457 2001-12-05 04:16:00
65048 2002-12-16 10:18:00
65048 2003-12-29 17:52:00
65048 2005-02-20 16:58:00
67037 2004-01-01 18:18:00
67037 2006-06-22 01:04:00
67037 2004-07-31 18:30:00
67037 2004-08-04 14:09:00
67037 2005-04-20 18:06:00
67037 2006-06-15 16:55:00

df1 <- structure(list(ID = c(6972L, 6972L, 6972L, 6972L, 6972L, 6972L,
6972L, 6972L, 6972L, 6972L, 6972L, 6972L, 6972L, 6972L, 40841L,
40841L, 40841L, 40841L, 59547L, 59547L, 59547L, 60861L, 63457L,
65048L, 65048L, 65048L, 67037L, 67037L, 67037L, 67037L, 67037L,
67037L), Application_SubmittedDate = structure(c(991241820, 1047126600,
1143049380, 1060287600, 1154111280, 990810840, 1064882880, 1023214260,
1146936600, 1046102520, 1158424140, 1045090020, 1029454200, 1030833120,
1033105140, 1010480700, 1034024640, 1029610200, 1060685100, 982688520,
1036537260, 1067265600, 1007525760, 1040033880, 1072720320, 1108918680,
1072981080, 1150938240, 1091298600, 1091628540, 1114020360, 1150390500
), class = c("POSIXct", "POSIXt"), tzone = "UTC")), .Names = c("ID",
"Application_SubmittedDate"), class = "data.frame", row.names = c(1L,
18L, 35L, 52L, 69L, 86L, 103L, 137L, 154L, 188L, 205L, 239L,
256L, 273L, 290L, 300L, 305L, 310L, 315L, 327L, 339L, 351L, 352L,
353L, 359L, 371L, 389L, 400L, 411L, 422L, 466L, 477L))

第二个数据集具有三列IDApplication_ProcessStartDateApplication_ProcessEndDate。这两个 Applicateion ProcessStarDate 和 EndDate 列是日期/时间列。

    ID     Application_ProcessStartDate Application_ProcessEndDate
65048 2005-02-20 12:44:22 2005-02-23 06:07:45
65048 2006-06-21 17:31:45 2006-06-24 01:42:41
111993 2006-06-21 17:31:45 2006-06-24 01:42:41




df2 <- structure(list(ID = c(65048L, 65048L, 111993L), Application_ProcessStartDate = structure(c(1108903462,
1150911105, 1150911105), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Application_ProcessEndDate = structure(c(1109138865, 1151113361,
1151113361), class = c("POSIXct", "POSIXt"), tzone = "UTC")), .Names = c("ID",
"Application_ProcessStartDate", "Application_ProcessEndDate"), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))

我的目标是首先合并)按 ID 2)在这些 ID 中合并来自 df1 的 ID,其中 Application_SubscribedDate 值在 Application_ProcessStartDateApplication_ProcessEndDate 值之间。

最终结果如下

         ID         Application_SubmittedDate   Application_ProcessStartDate    Application_ProcessEndDate
6972 2001-05-30 16:57:00
6972 2003-03-08 12:30:00
6972 2006-03-22 17:43:00
6972 2003-08-07 20:20:00
6972 2006-07-28 18:28:00
6972 2001-05-25 17:14:00
6972 2003-09-30 00:48:00
6972 2002-06-04 18:11:00
6972 2006-05-06 17:30:00
6972 2003-02-24 16:02:00
6972 2006-09-16 16:29:00
6972 2003-02-12 22:47:00
6972 2002-08-15 23:30:00
6972 2002-08-31 22:32:00
40841 2002-09-27 05:39:00
40841 2002-01-08 09:05:00
40841 2002-10-07 21:04:00
40841 2002-08-17 18:50:00
59547 2003-08-12 10:45:00
59547 2001-02-20 17:02:00
59547 2002-11-05 23:01:00
60861 2003-10-27 14:40:00
63457 2001-12-05 04:16:00
65048 2002-12-16 10:18:00
65048 2003-12-29 17:52:00
65048 2005-02-20 16:58:00 2005-02-20 12:44:22 2005-02-23 06:07:45
65048 NA 2006-06-21 17:31:45 2006-06-24 01:42:41
67037 2004-01-01 18:18:00
67037 2006-06-22 01:04:00
67037 2004-07-31 18:30:00
67037 2004-08-04 14:09:00
67037 2005-04-20 18:06:00
67037 2006-06-15 16:55:00
111993 NA 2006-06-21 17:31:45 2006-06-24 01:42:41

我已经尝试过foverlaps,它不处理日期/时间值,仅处理日期值,因此排除了这种情况。我还尝试了来自 sqldf 库的 JOIN ,但这仅执行 INNER JOINS ,而不执行 OUTER JOINS ,因此这也被排除。不知道如何实现这一点。非常感谢任何帮助或建议。

最佳答案

问题中的描述似乎不清楚,但也许您想要这些左连接之一。对于问题中显示的数据,它们分别生成 32 行和 3 行。

library(sqldf)

sqldf("select a.*,
b.Application_ProcessStartDate,
b.Application_ProcessEndDate
from df1 a left join df2 b
on a.ID = b.ID and
a.Application_SubmittedDate between
b.Application_ProcessStartDate and
b.Application_ProcessEndDate")

sqldf("select a.*,
b.Application_ProcessStartDate,
b.Application_ProcessEndDate
from df2 b left join df1 a
on a.ID = b.ID and
a.Application_SubmittedDate between
b.Application_ProcessStartDate and
b.Application_ProcessEndDate")

或者您可能正在寻找两者的并集:

sqldf("select a.*, 
b.Application_ProcessStartDate,
b.Application_ProcessEndDate
from df1 a left join df2 b
on a.ID = b.ID and
a.Application_SubmittedDate between
b.Application_ProcessStartDate and
b.Application_ProcessEndDate

union

select a.*,
b.Application_ProcessStartDate,
b.Application_ProcessEndDate
from df2 b left join df1 a
on a.ID = b.ID and
a.Application_SubmittedDate between
b.Application_ProcessStartDate and
b.Application_ProcessEndDate")

关于r 按两个日期之间的 id 和日期合并,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49079412/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com