gpt4 book ai didi

r - 使用 dplyr::mutate(across()) 将多列应用于自定义函数

转载 作者:行者123 更新时间:2023-12-05 08:49:35 24 4
gpt4 key购买 nike

df

a = c("aa", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb","bb") 
b = c("aa", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb","bb")
c = c("aa", "aa", "aa", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb","bb")
d = c(1, 1, 2, 2, 3, 3, 1, 1, 1, 1, 1, 1, 2, 2, 3, 3, 1, 1, 1, 1, 1)
df = data.frame(a,b,c,d)

列名:

cols <- c("a","b","c")

功能:

rare_label <- function(x){
freq = prop.table(table(unlist(x)))
make_rare = names(freq)[freq < 0.20]
lapply(x,
function(x) {
replace(x, x %in% make_rare, "Rare")
})}

希望使用 dplyr::mutate(across()) 评估 a、b、c 中所有值组合的比例,然后用比例更改任何类别低于 20% 为“稀有”。

输出:

     a    b    c
Rare Rare Rare
bb bb Rare
cc cc Rare
bb bb bb
bb bb bb
cc cc cc
bb bb bb
. . .
. . .
. . .

使用下面的代码会引发错误,我不确定原因。

df %<>%
mutate(across(where(cols), ~rare_label(.)

Error: unexpected symbol in: " mutate(across(where(cols),~rare_label(.) View"

最佳答案

一个选项可能是:

df %>%
mutate(across(all_of(cols),
~ replace(., . %in% names(which(prop.table(table(.)) < 0.20)), "rare")))

a b c d
1 rare rare rare 1
2 bb bb rare 1
3 cc cc rare 2
4 bb bb bb 2
5 bb bb bb 3
6 cc cc cc 3
7 bb bb bb 1
8 bb bb bb 1
9 cc cc cc 1
10 cc cc cc 1

如果要应用现有函数:

fun <- function(x) replace(x, x %in% names(which(prop.table(table(x)) < 0.20)), "rare")

df %>%
mutate(across(all_of(cols), fun))

关于r - 使用 dplyr::mutate(across()) 将多列应用于自定义函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63731527/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com