gpt4 book ai didi

r - 在 r 中按组标记非连续值

转载 作者:行者123 更新时间:2023-12-02 02:00:05 26 4
gpt4 key购买 nike

我有一个由多个组组成的数据集,这些组具有连续的编号容器(每个组中的容器数量不一定相同)以及 bool 存在/不存在值。我希望能够生成一些输出,指示其中存在不连续的“当前”值的组。

最小的表示如下:

x <- NULL
x$group <- c(rep("A",4),rep("B", 5), rep("C",4))
x$bin <- c(1,2,3,4,1,2,3,4,5,1,2,3,4)
x$status <- c("absent", "present", "absent", "present", "absent", "present", "present", "absent", "absent", "absent", "absent", "present", "present")

as.data.frame(x)

group bin status
1 A 1 absent
2 A 2 present
3 A 3 absent
4 A 4 present
5 B 1 absent
6 B 2 present
7 B 3 present
8 B 4 absent
9 B 5 absent
10 C 1 absent
11 C 2 absent
12 C 3 present
13 C 4 present

输出可能是同一数据框中带有标志的另一列,

   group bin  status flag
1 A 1 absent 1
2 A 2 present 1
3 A 3 absent 1
4 A 4 present 1
5 B 1 absent 0
6 B 2 present 0
7 B 3 present 0
8 B 4 absent 0
9 B 5 absent 0
10 C 1 absent 0
11 C 2 absent 0
12 C 3 present 0
13 C 4 present 0

单独的数据框或矩阵,例如:

  group  flag
1 A TRUE
2 B FALSE
3 C FALSE

或列表:

> flagged_groups
[1] "A"

我觉得通过写这篇文章,我已经整理出了实现这一目标所需要做的一些事情,但我很想听听您的想法,以简洁(整洁)的方式提炼我的数据。

最佳答案

你可以这样做:

library(dplyr)

df %>%
group_by(group) %>%
mutate(flag = +any(diff(row_number()[status == "present"]) != 1))

# A tibble: 14 x 4
# Groups: group [4]
group bin status flag
<chr> <dbl> <chr> <int>
1 A 1 absent 1
2 A 2 present 1
3 A 3 absent 1
4 A 4 present 1
5 B 1 absent 0
6 B 2 present 0
7 B 3 present 0
8 B 4 absent 0
9 B 5 absent 0
10 C 1 absent 0
11 C 2 absent 0
12 C 3 present 0
13 C 4 present 0

关于r - 在 r 中按组标记非连续值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69095762/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com