gpt4 book ai didi

R:for 循环创建由基于上一列的条件语句填充的新列

转载 作者:行者123 更新时间:2023-12-04 10:35:27 25 4
gpt4 key购买 nike

我的 [简化] 数据如下所示:

id = sample(1:20, 5)
first_active = c(1,1,1,2,3)
week1 = c(1,1,1,0,0)
week2 = c(1,0,0,1,0)
week3 = c(1,0,1,0,1)
week4 = c(1,0,0,0,1)
week5 = c(0,0,0,0,1)

df = data.frame(cbind(id, first_active, week1, week2, week3, week4, week5))

我想创建一个 for 循环:

i) 在同一个 data.frame 中,创建列 p1, p2,... 对应于 week1, week2,... 列并使用以下内容填充它们:

i) 如果对应的周值不为 0,则“active”`

ii) 如果给定周的值为 0,则检查之前的 p 列状态: if p[i-1] == "active" then "lapsed1"
iii) 如果给定周的值为 0,则检查之前的 p 列状态: if p[i-1] == "lapsed[j]" then "lapsed[j+1]"
iv) 否则,返回 NA
这将是上述示例的解决方案(在 mutate 中使用 dplyr ):
df %>%
mutate( p1 = ifelse(week1 != 0, "active", NA),
p2 = ifelse(week2 !=0, "active",
ifelse(p1 == "active", "lapsed1", NA)),
p3 = ifelse(week3 !=0, "active",
ifelse(p2 == "lapsed1", "lapsed2",
ifelse(p2 == "active", "lapsed1", NA))),
p4 = ifelse(week4 !=0, "active",
ifelse(p3 == "lapsed2", "lapsed3",
ifelse(p3 == "lapsed1", "lapsed2",
ifelse(p3 == "active", "lapsed1", NA)))),
p5 = ifelse(week5 !=0, "active",
ifelse(p4 == "lapsed3", "lapsed4",
ifelse(p4 == "lapsed2", "lapsed3",
ifelse(p4 == "lapsed1", "lapsed2",
ifelse(p4 == "active", "lapsed1", NA)))))
)


id first_active week1 week2 week3 week4 week5 p1 p2 p3 p4 p5
9 1 1 1 1 1 0 active active active active lapsed1
5 1 1 0 0 0 0 active lapsed1 lapsed2 lapsed3 lapsed4
14 1 1 0 1 0 0 active lapsed1 active lapsed1 lapsed2
3 2 0 1 0 0 0 <NA> active lapsed1 lapsed2 lapsed3
8 3 0 0 1 1 1 <NA> <NA> active active active

我想创建一个自动执行的函数/for 循环,因为我的原始数据有数十个“周”列可供引用。

到目前为止我设法得到的是:
df$p1 = ifelse(df$week1 > 0, "active", NA) # initiating the first p-column

for(i in 2:(ncol(df)-2)) { # defining dynamically number of periods

column_to_write = paste0("p", i, sep="") # column to be populated
prev_column = paste0("p", i-1, sep="") #previous p-column to the one that's being populated
orig_column = paste0("week", i, sep="") #reference 'week' column
j = 1 #initiating 'lapsed' number

df[column_to_write] = ifelse(df[orig_column]> 0, "active",
ifelse(df[prev_column] == "active", paste("lapsed", j, sep=""),
ifelse(df[prev_column] == paste0("lapsed", j, sep=""), paste0("lapsed", j=j+1, sep=""), NA)))

}

但这只会给我 "lapsed2" 的最大值并创建名为 week[i] 的新列而不是 p[i] .
 id first_active week1 week2 week3 week4 week5     p1   week2   week3   week4   week5
9 1 1 1 1 1 0 active active active active lapsed1
5 1 1 0 0 0 0 active lapsed1 lapsed2 <NA> <NA>
14 1 1 0 1 0 0 active lapsed1 active lapsed1 lapsed2
3 2 0 1 0 0 0 <NA> active lapsed1 lapsed2 <NA>
8 3 0 0 1 1 1 <NA> <NA> active active active

如何更改代码,以便 "lapsed" 中的数字值继续上升超过 2?

谢谢你的帮助!卡西亚

最佳答案

最后我放弃了 for 循环,而是遵循@Gregor 发布的建议;这是我所做的:

df_long = melt(df, id.vars = c("id", "first_active")) #transformed my wide data to the long format
colnames(df_long) = c("id", "first_active", "week_num", "week_orders")


df_long =
df_long %>%
mutate(p_var = paste("p", substr(week_num, 5, 5), sep="")) %>% #created p-columns that correspond to respective weeks arrange(id, week_num) %>%
group_by(id) %>%
mutate(active_var = ifelse(week_orders != 0, "active",
ifelse(first_active < as.numeric(substr(week_num, 5,5)),
"lapsed", NA))) %>% #created a column that would return either "active", "lapsed" or NA depending on user activity
mutate(lapsed_num = sequence(rle(active_var)[["lengths"]]), #created a column that would count the number of occurences of "lapsed" for a given id; it would start counting from 1 if after "active" appeared
final = ifelse(active_var == "active", active_var,
ifelse(active_var == "lapsed", paste(active_var, lapsed_num, sep=""), NA))) %>% #finally, the column takes "active" status or coalesces "lapsed" with the sequence number
select(id, first_active, week_num, week_orders, p_var, final) %>%
data.frame()

最后,我的数据是这样的:
head(df_final, 25)
active_var id first_active week_num week_orders p_var final
<NA> 3 2 week1 0 p1 <NA>
active 3 2 week2 1 p2 active
lapsed 3 2 week3 0 p3 lapsed1
lapsed 3 2 week4 0 p4 lapsed2
lapsed 3 2 week5 0 p5 lapsed3
active 5 1 week1 1 p1 active

所以,我需要做的就是转换 data.frame(分两步)
df_weeks = dcast(df_long[, 1:4], id + first_active ~ week_num,  value.var = "week_orders")

df_p = dcast(df_long[, c(1:2, 5:6)], id + first_active ~ p_var, value.var = "final")

并加入他们..
df_solution = inner_join(df_weeks, df_p)

瞧!
df_solution
id first_active week1 week2 week3 week4 week5 p1 p2 p3 p4 p5
3 2 0 1 0 0 0 <NA> active lapsed1 lapsed2 lapsed3
5 1 1 0 0 0 0 active lapsed1 lapsed2 lapsed3 lapsed4
8 3 0 0 1 1 1 <NA> <NA> active active active
9 1 1 1 1 1 0 active active active active lapsed1
14 1 1 0 1 0 0 active lapsed1 active lapsed1 lapsed2

关于R:for 循环创建由基于上一列的条件语句填充的新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39575664/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com