gpt4 book ai didi

r - 结合低频计数

转载 作者:行者123 更新时间:2023-12-04 12:00:43 25 4
gpt4 key购买 nike

试图通过将低频计数合并到“其他”类别来折叠名义分类向量:

数据(数据框的列)如下所示,包含所有 50 个状态的信息:

California
Florida
Alabama
...
table(colname)/length(colname)正确返回频率,我想要做的是将低于给定阈值(比如 f=0.02)的任何东西混在一起。正确的做法是什么?

最佳答案

从它的声音来看,以下内容应该适合您:

condenseMe <- function(vector, threshold = 0.02, newName = "Other") {
toCondense <- names(which(prop.table(table(vector)) < threshold))
vector[vector %in% toCondense] <- newName
vector
}

试试看:
## Sample data
set.seed(1)
a <- sample(c("A", "B", "C", "D", "E", sample(letters[1:10], 55, TRUE)))

round(prop.table(table(a)), 2)
# a
# a A b B c C d D e E f g h
# 0.07 0.02 0.07 0.02 0.10 0.02 0.10 0.02 0.12 0.02 0.07 0.12 0.13
# i j
# 0.08 0.07

a
# [1] "c" "d" "d" "e" "j" "h" "c" "h" "g" "i" "g" "d" "f" "D" "g" "h"
# [17] "h" "a" "b" "h" "e" "g" "h" "b" "d" "e" "e" "g" "i" "f" "d" "e"
# [33] "g" "c" "g" "a" "B" "i" "i" "b" "i" "j" "f" "d" "c" "h" "E" "j"
# [49] "j" "c" "C" "e" "f" "a" "a" "h" "e" "c" "A" "b"

condenseMe(a)
# [1] "c" "d" "d" "e" "j" "h" "c" "h"
# [9] "g" "i" "g" "d" "f" "Other" "g" "h"
# [17] "h" "a" "b" "h" "e" "g" "h" "b"
# [25] "d" "e" "e" "g" "i" "f" "d" "e"
# [33] "g" "c" "g" "a" "Other" "i" "i" "b"
# [41] "i" "j" "f" "d" "c" "h" "Other" "j"
# [49] "j" "c" "Other" "e" "f" "a" "a" "h"
# [57] "e" "c" "Other" "b"

但是请注意,如果您正在处理 factor s,你应该用 as.character 来转换它们第一的。

关于r - 结合低频计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34385340/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com