gpt4 book ai didi

r - 在 R 中 - 找到最小数量的单元格创建小于 n 的组

转载 作者:行者123 更新时间:2023-12-04 09:33:42 24 4
gpt4 key购买 nike

我有一个包含多个分类列的数据框,具有不同数量的唯一条目。当我对所有列进行 group_by 和汇总时,存在小于 n 的组,其中 n 是例如2.例如:

> df
A B C
1 x z a1
2 x z a2
3 x z a1
4 x w a1
5 x w a2
6 y w a1
7 y u a2
8 y u a2
9 y u a1
10 y u a1

DF = df %>% group_by_at(c(1:3)) %>% count()

# A tibble: 7 x 4
# Groups: A, B, C [7]
A B C n
<chr> <chr> <chr> <int>
1 x w a1 1
2 x w a2 1
3 x z a1 2
4 x z a2 1
5 y u a1 2
6 y u a2 2
7 y w a1 1

查找哪些单元格创建了小于 n 的组并将其值替换为一个统一值(假设为“其他”)的最有效方法是什么,以便在此过程中创建的最小组的大小ñ?我需要对更大的数据集执行此过程。

最佳答案

有多种方法可以解决这个问题,例如,您只能将 B 中的所有 w 和 z 替换为其他。我能想到的最简单和最快的解决方案可能是使用 data.table,但这种方法是否有意义取决于您的应用程序。

df = structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("x", "y"), class = "factor"), B = structure(c(3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("u", "w", "z"), class = "factor"), C = structure(c(1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L), .Label = c("a1", "a2"), class = "factor")), .Names = c("A", "B", "C"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
library(data.table)
mingroup=2
setDT(df)[,n:=.N,.(A,B,C)][n<mingroup,c('A','B','C'):='other']

输出:

      A     B     C n
1: x z a1 2
2: other other other 1
3: x z a1 2
4: other other other 1
5: other other other 1
6: other other other 1
7: y u a2 2
8: y u a2 2
9: y u a1 2
10: y u a1 2

备选方案:

df = structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("x", "y"), class = "factor"), B = structure(c(3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("u", "w", "z"), class = "factor"), C = structure(c(1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L), .Label = c("a1", "a2"), class = "factor")), .Names = c("A", "B", "C"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
df=setDT(df)
library(data.table)
mingroup=2
for(i in c('C','B','A'))
df[,n:=.N,.(A,B,C)][n<mingroup,eval(i):='other'][,n:=NULL]

输出:

        A     B     C
1: x z a1
2: other other other
3: x z a1
4: x w other
5: x w other
6: other other other
7: y u a2
8: y u a2
9: y u a1
10: y u a1

关于r - 在 R 中 - 找到最小数量的单元格创建小于 n 的组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48486792/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com