gpt4 book ai didi

r - 使用 group_by 时添加整体平均值

转载 作者:行者123 更新时间:2023-12-04 02:56:15 26 4
gpt4 key购买 nike

我正在使用 dplyr 包来生成一些表,我正在使用 adorn_totals("row")功能。

当我想对组内的值求和时,这很好用,但是在某些情况下,我想要一个整体平均值而不是总和。有 adorn_means 函数吗?

示例代码:

Regions2 <- Data %>%
filter(!is.na(REGION))%>%
group_by(REGION) %>%
summarise(Numberofpeople=length(Names))%>%
adorn_totals("row")

这里我的“总数”行只是区域内所有人的总和。这给了我
REGION          NumberofPeople
East Midlands 578,943
East of England 682,917
London 1,247,540
North East 245,830
North West 742,886
South East 963,040
South West 623,684
West Midlands 653,335
Yorkshire 553,853
TOTAL 6,292,028

我的下一段代码生成每个地区的平均工资,但我想为总数添加一个总体平均值
Regions3 <- Data %>%
filter(!is.na(REGION))%>%
filter(!is.na(AVGSalary))%>%
group_by(REGION) %>%
summarise(AverageSalary=mean(AVGSalary))

如果我使用 adnorn_totals("row")和以前一样,我只是得到平均值的总和,而不是数据集的整体平均值。

我如何获得总体平均值?

使用一些 noddy 数据更新:

数据
people  region      salary
person1 London 1000
person2 South West 1050
person3 South East 900
person4 London 800
person5 Scotland 1020
person6 South West 750
person7 East 600
person8 London 1200
person9 South West 1150

因此,组平均值为:
London      1000
South West 983.33
South East 900
Scotland 1020
East 600

我想将总体总数添加到底部
Total    941.11

最佳答案

1) 因为整体平均值是平均值的加权平均值(不是平均值的普通平均值),即它是 941 而不是 901,我们保持 n列,以便最终我们可以正确计算总体平均值。尽管显示的数据没有任何 NA,我们使用 drop_na以便将其与此类数据一起使用。这将删除任何包含 NA 的行。

library(dplyr)
library(tidyr)

Region %>%
drop_na %>%
group_by(region) %>%
summarize(avg = mean(salary), n = n()) %>%
ungroup %>%
bind_rows(summarize(., region = "Overall Avg",
avg = sum(avg * n) / sum(n),
n = sum(n))) %>%
select(-n)

给予:
# A tibble: 6 x 2
region avg
<chr> <dbl>
1 East 600
2 London 1000
3 Scotland 1020
4 South East 900
5 South West 983.
6 Overall Avg 941.

2) 另一种方法是通过返回原始数据来构建整体平均线:
Region %>%
drop_na %>%
group_by(region) %>%
summarize(avg = mean(salary)) %>%
ungroup %>%
bind_rows(summarize(Region %>% drop_na, region = "Overall Avg", avg = mean(salary)))

给予:
# A tibble: 6 x 2
region avg
<chr> <dbl>
1 East 600
2 London 1000
3 Scotland 1020
4 South East 900
5 South West 983.
6 Overall Avg 941.

2a) 如果您反对引用 Region两次然后试试这个。
Region_ <- Region %>% 
drop_na

Region_ %>%
group_by(region) %>%
summarize(avg = mean(salary)) %>%
ungroup %>%
bind_rows(summarize(Region_, region = "Overall Avg", avg = mean(salary)))

2b) 或作为单个管道,现在 Region_是管道本地的,管道完成后将自动删除:
Region %>%
drop_na %>%
{ Region_ <- .
Region_ %>%
group_by(region) %>%
summarize(avg = mean(salary)) %>%
ungroup %>%
bind_rows(summarize(Region_, region = "Overall Avg", avg = mean(salary)))
}

笔记

我们使用它作为输入:
Lines <- "people  region      salary
person1 London 1000
person2 South West 1050
person3 South East 900
person4 London 800
person5 Scotland 1020
person6 South West 750
person7 East 600
person8 London 1200
person9 South West 1150"

library(gsubfn)
Region <- read.pattern(text = Lines, pattern = "^(\\S+) +(.*) (\\d+)$",
as.is = TRUE, skip = 1, strip.white = TRUE,
col.names = read.table(text = Lines, nrow = 1, as.is = TRUE))

关于r - 使用 group_by 时添加整体平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52972628/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com