gpt4 book ai didi

r - 为什么 dplyr::mutate 函数给出了错误的答案?

转载 作者:行者123 更新时间:2023-12-05 02:45:35 26 4
gpt4 key购买 nike

我不知道我的代码有什么问题。我想创建一个包含 total/sum(total 列的列。但是,它只包含每一行的值 1。我写了这段代码:

eth_3 <- pop %>% 
filter(ETHNICITY == 2) %>%
group_by(PROVINCE, ETHNICITY) %>%
summarise(total = sum(WEIGHT)) %>%
select(PROVINCE, total) %>%
mutate(S = total/sum(total))

得到这个结果

PROVINCE   total     S
<int> <dbl> <dbl>
1 11 93925. 1
2 12 2016. 1
3 13 40 1
4 14 255. 1
5 16 10 1
6 18 58.3 1

输出必须是:

   PROVINCE    total         S
<int> <dbl> <dbl>
1 11 93925. 0.968
2 12 2016. 0.0208
3 13 40 0.000412
4 14 255. 0.00263
5 16 10 0.000103
6 18 58.3 0.000601
7 19 9.67 0.0000997
8 21 50.3 0.000519
9 31 34.7 0.000358
10 32 142. 0.00147

这是输出

structure(list(PROVINCE = c(11L, 12L, 13L, 14L, 16L, 18L, 19L, 
21L, 31L, 32L, 33L, 34L, 35L, 36L, 52L, 62L, 63L, 64L, 74L, 81L,
91L), total = c(93925.4300413131, 2015.98999500274, 40, 255.349998474121,
10, 58.3199987411499, 9.6700000762939, 50.340000152588, 34.6899995803834,
142.189999580384, 30.0199995040892, 48.5600004196165, 160.789996147154,
60.8100004196172, 9.8800001144409, 52.199997901915, 21.60000038147,
19.7199993133544, 10.130000114441, 9.8999996185303, 28.1400003433227
), S = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1)), row.names = c(NA, -21L), groups = structure(list(PROVINCE = c(11L,
12L, 13L, 14L, 16L, 18L, 19L, 21L, 31L, 32L, 33L, 34L, 35L, 36L,
52L, 62L, 63L, 64L, 74L, 81L, 91L), .rows = structure(list(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,
15L, 16L, 17L, 18L, 19L, 20L, 21L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 21L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))

最佳答案

summarise 默认情况下取消分组最后一层。因此,在 summarise 之后,您的数据仍按 PROVINCE 分组。在计算比例之前,您应该取消分组

library(dplyr)

eth_3 <- pop %>%
filter(ETHNICITY == 2) %>%
group_by(PROVINCE, ETHNICITY) %>%
summarise(total = sum(WEIGHT)) %>%
select(PROVINCE, total) %>%
ungroup %>%
mutate(S = total/sum(total))
#mutate(S = prop.table(total))

如果您有 dplyr > 1.0.0,您可以指定 .groups = 'drop' 而不是使用 ungroup

pop %>% 
filter(ETHNICITY == 2) %>%
group_by(PROVINCE, ETHNICITY) %>%
summarise(total = sum(WEIGHT), .groups = 'drop') %>%
select(PROVINCE, total) %>%
mutate(S = total/sum(total))

关于r - 为什么 dplyr::mutate 函数给出了错误的答案?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65896735/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com