gpt4 book ai didi

r - 根据具有条件的列值按组对行进行聚类

转载 作者:行者123 更新时间:2023-12-04 15:58:31 25 4
gpt4 key购买 nike

几天前我打开了这个帖子:

Clustering rows by group based on column value

其中我们得到了这个结果:

df <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,1, 1, 1,1,1,1,1),
Obs1 = c(1,1,0,1,0,1,1,0,1,0,0,0,1,1,1,1,1),
Control = c(0,3,3,1,12,1,1,1,36,13,1,1,2,24,2,2,48),
ClusterObs1 = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5))

与:

df <- df %>% 
group_by(ID) %>%
mutate_at(vars(Obs1),
funs(ClusterObs1= with(rle(.), rep(cumsum(values == 1), lengths))))

现在我必须做一些修改:

如果 'Control' 的值高于 12 并且实际的 'Obs1' 值等于 1 和之前的 'Obs1' 值,'DesiredResultClusterObs1' 值应加 +1

df <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,1, 1, 1,1,1,1,1),
Obs1 = c(1,1,0,1,0,1,1,0,1,0,0,0,1,1,1,1,1),
Control = c(0,3,3,1,12,1,1,1,36,13,1,1,2,24,2,2,48),
ClusterObs1 = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5),
DesiredResultClusterObs1 = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 7))

我考虑过添加 if_else 条件,但没有成功,有什么想法吗?

编辑:许多列会怎样?

最佳答案

这似乎可行:

df %>%
mutate(DesiredResultClusterOrbs1 = with(rle(Control > 12 & Obs1 == 1 & lag(Obs1) == 1),
rep(cumsum(values == 1), lengths)) + ClusterObs1)

ID Obs1 Control ClusterObs1 DesiredResultClusterOrbs1
1 1 1 0 1 1
2 1 1 3 1 1
3 1 0 3 1 1
4 1 1 1 2 2
5 1 0 12 2 2
6 1 1 1 3 3
7 1 1 1 3 3
8 1 0 1 3 3
9 1 1 36 4 4
10 1 0 13 4 4
11 1 0 1 4 4
12 1 0 1 4 4
13 1 1 2 5 5
14 1 1 24 5 6
15 1 1 2 5 6
16 1 1 2 5 6
17 1 1 48 5 7

基本上,我们使用您之前线程中的rle+rep 机制,根据您的TRUE/FALSE 结果创建一个累积向量条件并将其添加到现有的 ClusterObs1


如果你想创建多个DesiredResultClusterOrbs,你可以使用mapply。也许有一个 dplyr 解决方案,但这是基础 R

数据:

df <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,1, 1, 1,1,1,1,1),
Obs1 = c(1,1,0,1,0,1,1,0,1,0,0,0,1,1,1,1,1),
Obs2 = rbinom(17, 1, .5),
Control = c(0,3,3,1,12,1,1,1,36,13,1,1,2,24,2,2,48),
ClusterObs1 = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5))

df <- df %>%
mutate_at(vars(Obs2),
funs(ClusterObs2= with(rle(.), rep(cumsum(values == 1), lengths))))

循环:

newcols <- mapply(function(x, y){
with(rle(df$Control > 12 & x == 1 & lag(x) == 1),
rep(cumsum(values == 1), lengths)) + y
}, df[2:3], df[5:6])

这会生成一个包含新列的矩阵,然后您可以将其重命名并 cbind 到您的数据:

colnames(newcols) <- paste0("DesiredResultClusterOrbs", 1:2)

cbind.data.frame(df, newcols)

ID Obs1 Obs2 Control ClusterObs1 ClusterObs2 DesiredResultClusterOrbs1 DesiredResultClusterOrbs2
1 1 1 1 0 1 1 1 1
2 1 1 1 3 1 1 1 1
3 1 0 0 3 1 1 1 1
4 1 1 0 1 2 1 2 1
5 1 0 0 12 2 1 2 1
6 1 1 0 1 3 1 3 1
7 1 1 1 1 3 2 3 2
8 1 0 0 1 3 2 3 2
9 1 1 1 36 4 3 4 3
10 1 0 1 13 4 3 4 4
11 1 0 0 1 4 3 4 4
12 1 0 1 1 4 4 4 5
13 1 1 1 2 5 4 5 5
14 1 1 0 24 5 4 6 5
15 1 1 1 2 5 5 6 6
16 1 1 1 2 5 5 6 6
17 1 1 1 48 5 5 7 7

关于r - 根据具有条件的列值按组对行进行聚类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51038794/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com