gpt4 book ai didi

r - 如何确定在R中连续两个时间段满足特定条件的组

转载 作者:行者123 更新时间:2023-12-01 09:13:19 24 4
gpt4 key购买 nike

假设我有一个名为 data 的简单数据集:

customer_id <- c("1","1","1","2","2","2","2","3","3","3")
account_id <- as.character(c(11,11,11,55,55,55,55,38,38,38))
obs_date <- c(as.Date("2017-01-01","%Y-%m-%d"), as.Date("2017-02-01","%Y-%m-%d"), as.Date("2017-03-01","%Y-%m-%d"),
as.Date("2017-12-01","%Y-%m-%d"), as.Date("2018-01-01","%Y-%m-%d"), as.Date("2018-02-01","%Y-%m-%d"),
as.Date("2018-03-01","%Y-%m-%d"), as.Date("2018-04-01","%Y-%m-%d"), as.Date("2018-05-01","%Y-%m-%d"),
as.Date("2018-06-01","%Y-%m-%d"))
variable <- c(87,90,100,120,130,150,12,13,15,14)
data <- data.table(customer_id,account_id,obs_date,variable)

我想添加另一个称为指标的变量,对于那些在两个或多个连续观察日期 (obs_date) 中具有变量 <= 90 的 customer_id、account_id 对,它等于 1,否则为零。因此,对于第一个和第三个 customer_id、account_id 对,指标将等于 1,如下所示:

indicator <- c(1,1,1,0,0,0,0,1,1,1)
data <- data.table(customer_id,account_id,obs_date,variable, indicator)

您知道如何创建这个称为指标的变量吗?我需要按 customer_id、account_id 进行分组,并确定变量 <= 90 至少连续两个时间段的变量。非常感谢。

最佳答案

你可以...

data[, v := with(rle(variable <= 90), 
any(lengths >= 2 & values)
), by=.(customer_id, account_id)]

customer_id account_id obs_date variable indicator v
1: 1 11 2017-01-01 87 1 TRUE
2: 1 11 2017-02-01 90 1 TRUE
3: 1 11 2017-03-01 100 1 TRUE
4: 2 55 2017-12-01 120 0 FALSE
5: 2 55 2018-01-01 130 0 FALSE
6: 2 55 2018-02-01 150 0 FALSE
7: 2 55 2018-03-01 12 0 FALSE
8: 3 38 2018-04-01 13 1 TRUE
9: 3 38 2018-05-01 15 1 TRUE
10: 3 38 2018-06-01 14 1 TRUE

要了解它是如何工作的,请看一个更简单的行:

data[, rle(variable <= 90), by=.(customer_id, account_id)]

customer_id account_id lengths values
1: 1 11 2 TRUE
2: 1 11 1 FALSE
3: 2 55 3 FALSE
4: 2 55 1 TRUE
5: 3 38 3 TRUE

关于r - 如何确定在R中连续两个时间段满足特定条件的组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52504352/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com