gpt4 book ai didi

r - weighted.mean, summarise() 和 across()

转载 作者:行者123 更新时间:2023-12-04 01:14:56 26 4
gpt4 key购买 nike

我想按数字聚合以下数据框(变量 y 和 z)并按“权重”对其加权。其工作方式如下:

df = data.frame(number=c("a","a","a","b","c","c"), y=c(1,2,3,4,1,7),
z=c(2,2,6,8,9,1), weight =c(1,1,3,1,2,1))


aggregate = df %>%
group_by(number) %>%
summarise_at(vars(y,z), funs(weighted.mean(. , w=weight)))

由于不应再使用 summarise_at,因此我尝试使用 across。但我没有成功:

aggregate = df %>%
group_by(number) %>%
summarise(across(everything(), list( mean = mean, sd = sd)))

# this works for mean but I can't just change it with "weighted.mean" etc.


最佳答案

我们可以用~传递匿名函数。通过检查 summarise_at,OP 只想返回列“y”、“z”的摘要,即使用 everything() 也会返回 mean sd 和 'weight' 列的 weighted.mean 也没什么意义

library(dplyr)
df %>%
group_by(number) %>%
summarise(across(c(y, z),
list( mean = mean, sd = sd,
weighted = ~weighted.mean(., w = weight))), .groups = 'drop')
# A tibble: 3 x 7
# number y_mean y_sd y_weighted z_mean z_sd z_weighted
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 a 2 1 2.4 3.33 2.31 4.4
#2 b 4 NA 4 8 NA 8
#3 c 4 4.24 3 5 5.66 6.33

通常,meansd 在没有 NA 元素时效果很好。但是如果有NA值,我们可能需要使用na.rm = TRUE(默认为FALSE)。在这种情况下, lambda 调用对于传递附加参数很有用

df %>%
group_by(number) %>%
summarise(across(c(y, z),
list( mean = ~mean(., na.rm = TRUE), sd = ~sd(., na.rm = TRUE),
weighted = ~weighted.mean(., w = weight))), .groups = 'drop')

关于r - weighted.mean, summarise() 和 across(),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63651494/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com