gpt4 book ai didi

r - 基于滞后行的条件和

转载 作者:行者123 更新时间:2023-12-05 08:36:57 24 4
gpt4 key购买 nike

我有一个数据框来衡量每月的登录情况。我正在尝试创建一个计数器来测量 months_since_zero_login,它仅在一个月内的登录数为零时才添加。第一个月,每个客户的柜台将从零开始。

数据如下:

library(tidyverse)

obs <- seq(as.Date('2020-01-01'),
as.Date('2020-05-01'),
by = "month")
table <- tibble(customer = seq(1:3))
#output
table <- table %>%
crossing(obs) %>%
mutate(login = c(3, 0, 0, 0, 2,
0, 1, 5, 0, 0,
1, 3, 1, 5, 0))

这是预期的结果:

   customer obs        login months_since_zero_login
<int> <date> <dbl> <dbl>
1 1 2020-01-01 3 0
2 1 2020-02-01 0 0
3 1 2020-03-01 0 1
4 1 2020-04-01 0 2
5 1 2020-05-01 2 0
6 2 2020-01-01 0 0
7 2 2020-02-01 1 0
8 2 2020-03-01 5 0
9 2 2020-04-01 0 0
10 2 2020-05-01 0 1
11 3 2020-01-01 1 0
12 3 2020-02-01 3 0
13 3 2020-03-01 1 0
14 3 2020-04-01 5 0
15 3 2020-05-01 0 0

到目前为止,这是我的代码,但我一直在研究如何在出现连续零时(在客户 1 的情况下)将计数器加 1

table %>% 
group_by(customer) %>%
mutate(months_since_zero_login = case_when(
row_number() == 1 ~ 0,
lag(login) == 0 & login == 0 ~ 1,
TRUE ~ 0
))
#does not increase counter when there are consecutive zeroes

最佳答案

这可以通过 rleid 来完成。根据 'login' 中出现的 '0' 值创建一个临时分组列,然后按 'customer'、'grp' 分组,同时将 i 指定为 'login == 0' 的行,将“months_since_zero_login”创建为减去 1 的行序列。将同一列中的 NA 元素替换为 0(如果需要)

library(data.table)
setDT(table)[, grp := rleid(login == 0), .(customer)]
table[login == 0, months_since_zero_login := seq_len(.N) - 1,
.(customer, grp)][, grp := NULL]
table[is.na(months_since_zero_login), months_since_zero_login := 0]

-输出

table
# customer obs login months_since_zero_login
# 1: 1 2020-01-01 3 0
# 2: 1 2020-02-01 0 0
# 3: 1 2020-03-01 0 1
# 4: 1 2020-04-01 0 2
# 5: 1 2020-05-01 2 0
# 6: 2 2020-01-01 0 0
# 7: 2 2020-02-01 1 0
# 8: 2 2020-03-01 5 0
# 9: 2 2020-04-01 0 0
#10: 2 2020-05-01 0 1
#11: 3 2020-01-01 1 0
#12: 3 2020-02-01 3 0
#13: 3 2020-03-01 1 0
#14: 3 2020-04-01 5 0
#15: 3 2020-05-01 0 0

有了dplyr,我们仍然可以使用rleid

library(dplyr)
table %>%
group_by(grp = rleid(customer, login == 0), customer) %>%
mutate(months_since_zero_login = if(all(login == 0))
row_number() - 1 else 0) %>%
ungroup %>%
select(-grp)

-输出

# A tibble: 15 x 4
# customer obs login months_since_zero_login
# <int> <date> <dbl> <dbl>
# 1 1 2020-01-01 3 0
# 2 1 2020-02-01 0 0
# 3 1 2020-03-01 0 1
# 4 1 2020-04-01 0 2
# 5 1 2020-05-01 2 0
# 7 2 2020-02-01 1 0
# 8 2 2020-03-01 5 0
# 9 2 2020-04-01 0 0
#10 2 2020-05-01 0 1
#11 3 2020-01-01 1 0
#12 3 2020-02-01 3 0
#13 3 2020-03-01 1 0
#14 3 2020-04-01 5 0
#15 3 2020-05-01 0 0

或者使用base R中的rle

f1 <- function(x) {
with(rle(x == 0), rep(values, lengths) * (sequence(lengths) - 1))
}

table$months_since_zero_login <- with(table, ave(login, customer, FUN = f1))

关于r - 基于滞后行的条件和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67154763/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com