gpt4 book ai didi

r - 使用 R 中的 dplyr 查找分组观察的比例

转载 作者:行者123 更新时间:2023-12-02 20:55:11 25 4
gpt4 key购买 nike

我经常使用函数group_by()summarize() (注意:如果汇总统计量为 count(),则这与 sum() 函数相同) dplyr 中的函数封装在 R

下面是一个示例:

library(dplyr)

data <- data.frame(
group = sample(rep(c("Group A", "Group B", "Group C", "Group D"), 4), 16, replace = F),
factor = sample(rep(c("Factor 1", "Factor 2"), 8), 16, replace = F),
var1 = sample(1:16)
)

这是输出:

out_df <- 
data %>%
group_by(group) %>%
summarize(sum_var1 = sum(var1))

print(out_df)

Source: local data frame [7 x 3]
Groups: group [4]

group factor sum_var1
<fctr> <fctr> <int>
1 Group A Factor 1 29
2 Group B Factor 1 8
3 Group C Factor 1 33
4 Group D Factor 1 12
5 Group A Factor 2 27
6 Group B Factor 2 10
7 Group C Factor 2 17

现在,我很多次想找出各自的比例sum_var1变量不是总和的比例,而是某个因子水平的总和的比例,例如 factor变量在这里。

我通常通过查找因子每个水平的总和,然后手动将观察值除以它来做到这一点,如下所示:

out_df %>% group_by(factor) %>% summarize(factor_sum = sum(sum_var1))
to_divide <- (c(rep(82, 4), rep(54, 4)))
out_df$factor_prop_sum_var1 <- out_df$sum_var1 / to_divide

这会产生所需的输出,我可以检查sumfactor_prop_sum_var1等于 1 :

out_df

Source: local data frame [8 x 4]
Groups: group [4]

group factor sum_var1 factor_prop_sum_var1
<fctr> <fctr> <int> <dbl>
1 Group A Factor 1 26 0.3170732
2 Group B Factor 1 17 0.2073171
3 Group C Factor 1 19 0.2317073
4 Group D Factor 1 18 0.2195122
5 Group A Factor 2 8 0.1481481
6 Group B Factor 2 19 0.3518519
7 Group C Factor 2 7 0.1296296
8 Group D Factor 2 22 0.4074074

out_df %>% group_by(factor) %>% summarize(checking = sum(factor_prop_sum_var1))

# A tibble: 2 × 2
factor checking
<fctr> <dbl>
1 Factor 1 1
2 Factor 2 1

这可行,但充其量也很笨拙。有没有办法更优雅地做到这一点(最好在dplyr“管道”内)?

最佳答案

要获取组内的比例,只需仅按您希望比例添加到 100% 的列进行分组即可。因此,在这种情况下,在获得 groupfactor 的每个组合的总和后,再次使用 group_by,但这次仅按 factor,然后计算百分比。

library(dplyr)

set.seed(100)
data <- data.frame(
group = sample(rep(c("Group A", "Group B", "Group C", "Group D"), 4), 16, replace = F),
factor = sample(rep(c("Factor 1", "Factor 2"), 8), 16, replace = F),
var1 = sample(1:16)
)

data %>%
group_by(group, factor) %>%
summarize(sum_var1 = sum(var1)) %>%
group_by(factor) %>%
mutate(percent = sum_var1/sum(sum_var1)) %>%
arrange(factor)
    group   factor sum_var1    percent
1 Group A Factor 1 13 0.25000000
2 Group B Factor 1 8 0.15384615
3 Group C Factor 1 21 0.40384615
4 Group D Factor 1 10 0.19230769
5 Group A Factor 2 20 0.23809524
6 Group B Factor 2 27 0.32142857
7 Group C Factor 2 2 0.02380952
8 Group D Factor 2 35 0.41666667

关于r - 使用 R 中的 dplyr 查找分组观察的比例,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40621167/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com