gpt4 book ai didi

r - 如何检查分组列中的所有值是否相同?

转载 作者:行者123 更新时间:2023-12-05 09:28:36 25 4
gpt4 key购买 nike

如何检查分组列中的所有值是否相同?

例如,我有以下df:

   id category yes
1 1 in 1
2 1 in 1
3 1 in 1
4 1 in 1
5 1 in 1
6 1 out 1
7 1 out 1
8 1 out 1
9 2 in 1
10 2 in 1
11 2 out 0
12 2 out 1
13 2 out 1
14 3 in 1
15 3 in 1
16 3 in 0
17 3 out 1
18 3 out 1
19 4 in 1
20 4 in 1
21 4 in 1
22 4 out 1
23 4 out 0

我想做这样的事情:

df <- df %>%
group_by(id, category) %>%
mutate(
out = ifelse(# id, category, and yes have the same values in each row within the group)
)

因此预期的输出将如下所示:

   id category yes same
1 1 in 1 1
2 1 in 1 1
3 1 in 1 1
4 1 in 1 1
5 1 in 1 1
6 1 out 1 1
7 1 out 1 1
8 1 out 1 1
9 2 in 1 1
10 2 in 1 1
11 2 out 0 0
12 2 out 1 0
13 2 out 1 0
14 3 in 1 0
15 3 in 1 0
16 3 in 0 0
17 3 out 1 1
18 3 out 1 1
19 4 in 1 1
20 4 in 1 1
21 4 in 1 1
22 4 out 1 0
23 4 out 0 0

第 11-13 行具有相同的“id”和“category”,但“yes”列具有不同的值。因此,“相同”列应标记为 0(因为它们不相同)。与第 14-16 行和第 22-23 行相同。

这是 df 的可重现代码:

structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), category = c("in",
"in", "in", "in", "in", "out", "out", "out", "in", "in", "out",
"out", "out", "in", "in", "in", "out", "out", "in", "in", "in",
"out", "out"), yes = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L)), class = "data.frame", row.names = c(NA, -23L))

如有任何指导,我们将不胜感激!

最佳答案

我们可以使用 n_distinct 检查组中唯一元素的频率,转换为逻辑 (== 1),然后使用 as 转换为二进制。整数+

library(dplyr)
df %>%
group_by(id, category) %>%
mutate(same = +(n_distinct(yes) == 1)) %>%
ungroup

或者使用data.table

library(data.table)
setDT(df)[, same := +(uniqueN(yes) == 1), by = .(id, category)]

关于r - 如何检查分组列中的所有值是否相同?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71256643/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com