% mutate_at(vars(A:E), -6ren">
gpt4 book ai didi

R根据条件(相同ID)替换值而不使用for循环

转载 作者:行者123 更新时间:2023-12-04 11:34:50 25 4
gpt4 key购买 nike

我有一个类似于这个的 df,但更大(100.000 行 x 100 列)

df <-data.frame(id=c("1","2","2","3","4","4", "4", "4", "4", "4", "5"), date = c("2015-01-15", "2004-03-01", "2017-03-15", "2000-01-15", "2006-05-08", "2008-05-09", "2014-05-11", "2014-06-11", "2014-07-11", "2014-08-11", "2015-12-19"), A =c (0,1,1,0,1,1,0,0,1,1,1), B=c(1,0,1,0,1,0,0,0,1,1,1), C = c(0,1,0,0,0,1,1,1,1,1,0), D = c(0,0,0,1,1,1,1,0,1,0,1), E = c(1,1,1,0,0,0,0,0,1,1,1), A.1 = c(0,0,0,0,0,0,0,0,0,0,0), B.1 = c(0,0,0,0,0,0,0,0,0,0,0), C.1 = c(0,0,0,0,0,0,0,0,0,0,0), D.1 = c(0,0,0,0,0,0,0,0,0,0,0), E.1 = c(0,0,0,0,0,0,0,0,0,0,0), acumulativediff = c(0, 0, 4762, 0, 0, 732, 2925, 2956, 2986, 3017, 0))

我必须完成的是:

structure(list(id = structure(c(1L, 2L, 2L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,5L), .Label = c("1", "2", "3", "4", "5"), class = "factor"), date = structure(c(9L, 2L, 11L, 1L, 3L, 4L, 5L, 6L, 7L, 8L,10L), .Label = c("2000-01-15", "2004-03-01", "2006-05-08","2008-05-09", "2014-05-11", "2014-06-11", "2014-07-11", "2014-08-11","2015-01-15", "2015-12-19", "2017-03-15"), class = "factor"), A = c(0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1), B = c(1, 0, 1, 0,1, 0, 0, 0, 1, 1, 1), C = c(0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0), D = c(0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1), E = c(1, 1, 1,0, 0, 0, 0, 0, 1, 1, 1), A.1 = c(0, 0, 4762, 0, 0, 732, 2925,0, 0, 3017, 0), B.1 = c(0, 0, 0, 0, 0, 732, 0, 0, 0, 3017,0), C.1 = c(0, 0, 4762, 0, 0, 0, 2925, 2956, 2986, 3017,
0), D.1 = c(0, 0, 0, 0, 0, 732, 2925, 2956, 0, 3017, 0),E.1 = c(0, 0, 4762, 0, 0, 0, 0, 0, 0, 3017, 0), acumulativediff = c(0, 0, 4762, 0, 0, 732, 2925, 2956, 2986, 3017, 0)), .Names = c("id","date", "A", "B", "C", "D", "E", "A.1", "B.1", "C.1", "D.1", "E.1", "acumulativediff"), row.names = c(NA,-11L), class = "data.frame")

这个想法是基于两个条件,将 A.1、B.1、C.1 列中的 0 替换为“acumulativediff”列的值:

df[i,1]  == df[i-1,1] & df[i,names] == "1" & df[i-1,names] == "1", df[i,diff]
df[i,1] == df[i-1,1] & df[i,names] == "0" & df[i-1,names] == "1", df[i,diff]

我能够做到这一点,使用一个非高效的循环-for 似乎适用于小 df 但不适用于较大的 df(大约需要两个小时)

names <- colnames(df[3:7])
names2 <- colnames(df[8:12])
diff <- which(colnames(df)=="acumulativediff")
for (i in 2:nrow(df)){
df[i,names2] <- ifelse (df[i,1] == df[i-1,1] & df[i,names] == "1" &
df[i-1,names] == "1", df[i,diff],
ifelse (df[i,1] == df[i-1,1] & df[i,names] == "0" & df[i-1,names] == "1", df[i,diff], 0))}

有什么想法或建议可以省略循环以实现更高效的代码?

最佳答案

我建议忽略 A.1、B.1 等 列。只需使用 dplyr::mutate_atOP 指定的规则重新创建这些列。 dplyr::lagdefault = 0 将有助于避免 NA 结果。

library(dplyr)

df %>% select(-ends_with(".1")) %>%
mutate_at(vars(A:E),
funs(l = ifelse(lag(id)==id & lag(., default=0) == "1",acumulativediff,0)))


# id date A B C D E acumulativediff A_l B_l C_l D_l E_l
# 1 1 2015-01-15 0 1 0 0 1 0 0 0 0 0 0
# 2 2 2004-03-01 1 0 1 0 1 0 0 0 0 0 0
# 3 2 2017-03-15 1 1 0 0 1 4762 4762 0 4762 0 4762
# 4 3 2000-01-15 0 0 0 1 0 0 0 0 0 0 0
# 5 4 2006-05-08 1 1 0 1 0 0 0 0 0 0 0
# 6 4 2008-05-09 1 0 1 1 0 732 732 732 0 732 0
# 7 4 2014-05-11 0 0 1 1 0 2925 2925 0 2925 2925 0
# 8 4 2014-06-11 0 0 1 0 0 2956 0 0 2956 2956 0
# 9 4 2014-07-11 1 1 1 1 1 2986 0 0 2986 0 0
# 10 4 2014-08-11 1 1 1 0 1 3017 3017 3017 3017 3017 3017
# 11 5 2015-12-19 1 1 0 1 1 0 0 0 0 0 0

关于R根据条件(相同ID)替换值而不使用for循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50605733/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com