gpt4 book ai didi

r - R中按日期范围的新变量

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:50:57 25 4
gpt4 key购买 nike

我正在尝试创建一个基于 customer_id 和日期的新变量。该表是所有客户联系的日志,所以会有重复的客户ID。我想做的是创建一个新变量,该变量通过使用每个客户在 x 天内的联系日期来进行连续计数。来自客户的所有第一次联系将 = 1,如果自上次联系以来间隔大于 x 天,则该联系将为 2,依此类推。我正在尝试创建“旅程”变量。

感谢您的指导。

enter image description here

代码如下:

structure(list(Customer = structure(c(1L, 1L, 2L, 3L, 3L, 3L, 
4L, 4L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"),
Start_dt = c("2018-04-30 13:47:13", "2018-05-03 09:22:25",
"2018-04-22 10:45:33", "2018-04-20 09:55:51", "2018-04-21 14:20:33",
"2018-05-01 15:27:43", "2018-03-28 11:25:45", "2018-04-28 10:30:35",
"2018-05-17 11:08:51", "2018-06-02 10:38:38"), End_dt = c("2018-04-30 14:22:15",
"2018-05-03 10:05:32", "2018-04-22 11:00:35", "2018-04-20 09:57:45",
"2018-04-21 14:27:14", "2018-05-01 16:03:25", "2018-03-28 11:35:54",
"2018-04-28 11:02:17", "2018-05-17 12:32:18", "2018-06-02 11:08:29"
), Journey = c(1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 3L, 4L)), class = "data.frame", row.names = c(NA,
-10L))

最佳答案

请看下面的算法,将character 向量转换为Date 对象,然后按因子列拆分data.frame。在 lapply 函数中,算法使用 zlag 函数检查 Journey 识别的条件。最后它使用 do.call 函数连接数据帧。

df <- structure(list(Customer = structure(c(1L, 1L, 2L, 3L, 3L, 3L, 
4L, 4L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"),
Start_dt = structure(c(17651, 17654, 17643, 17641, 17642,
17652, 17618, 17649, 17668, 17684), class = "Date"), End_dt = structure(c(17651,
17654, 17643, 17641, 17642, 17652, 17618, 17649, 17668, 17684
), class = "Date")), row.names = c(NA, -10L), class = "data.frame")
library(lubridate, TSA)
df$Start_dt <- as_date(df$Start_dt)
df$End_dt <- as_date(df$End_dt)

x <- 10 # 10 days

y <- lapply(
X = split(df, df$Customer),
FUN = function(dfx) {
dfx$lagged <- as_date(zlag(dfx$Start_dt))
dfx$dt <- dfx$Start_dt - dfx$lagged
dfx$dt <- ifelse(dfx$dt < x, 0, 1)
dfx$dt[1] <- 1
dfx$Journey <- cumsum(dfx$dt)
dfx[, -c(5:6)]
})
z <- do.call(rbind, y)
rownames(z) <- NULL
z

输出:

   Customer   Start_dt     End_dt Journey
1 A 2018-04-30 2018-04-30 1
2 A 2018-05-03 2018-05-03 1
3 B 2018-04-22 2018-04-22 1
4 C 2018-04-20 2018-04-20 1
5 C 2018-04-21 2018-04-21 1
6 C 2018-05-01 2018-05-01 2
7 D 2018-03-28 2018-03-28 1
8 D 2018-04-28 2018-04-28 2
9 D 2018-05-17 2018-05-17 3
10 D 2018-06-02 2018-06-02 4

关于r - R中按日期范围的新变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51922812/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com