gpt4 book ai didi

r - Purrr(或扫帚)用于计算分组数据集的比例测试(多比例测试)

转载 作者:行者123 更新时间:2023-12-03 19:15:07 25 4
gpt4 key购买 nike

假设我有一个由“年份”和“认知障碍”组成的数据框(1=是,0=否则)

dataset

我想比较每年的比例。因此,2000 年将是:

 df %>% 
filter(year == 2000) %>%
{prop.test(rev(table(.$cogimp)),p = 0.5, conf.level=0.95)}

我可以通过以下方式检查:
prop.test(x = 3, n = 30, p = 0.5, conf.level=0.95)

但是,在我看来,我可以通过使用 broom 或 purrr 来简化这些分析。
我的目标是拥有一张这样的 table :

final table

代码如下:
df <- structure(list(year = c(2000, 2000, 2015, 2015, 2000, 2015, 2000, 
2000, 2000, 2000, 2015, 2006, 2015, 2015, 2010, 2006, 2006, 2010,
2000, 2006, 2015, 2006, 2015, 2015, 2000, 2015, 2000, 2015, 2015,
2010, 2015, 2015, 2015, 2000, 2006, 2006, 2006, 2015, 2015, 2006,
2015, 2010, 2000, 2000, 2010, 2006, 2010, 2010, 2015, 2000, 2015,
2006, 2000, 2006, 2015, 2006, 2000, 2010, 2010, 2010, 2015, 2006,
2015, 2000, 2015, 2010, 2010, 2010, 2010, 2000, 2000, 2000, 2006,
2015, 2015, 2000, 2000, 2000, 2015, 2006, 2006, 2010, 2006, 2000,
2010, 2000, 2015, 2015, 2015, 2015, 2010, 2000, 2000, 2010, 2006,
2010, 2010, 2000, 2000, 2000), cogimp = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -100L), class = c("tbl_df",
"tbl", "data.frame"))

df %>%
count(year, cogimp)

df %>%
filter(year == 2006) %>%
{prop.test(rev(table(.$cogimp)),p = 0.5, conf.level=0.95)}

prop.test(x = 3, n = 30, p = 0.5, conf.level=0.95)
prop.test(x = 2, n = 19, p = 0.5, conf.level=0.95)

最佳答案

使用 tidy从扫帚包。改编自 https://stackoverflow.com/a/30015869/13157536

library(dplyr)
library(broom)

df <- structure(list(year = c(2000, 2000, 2015, 2015, 2000, 2015, 2000,
2000, 2000, 2000, 2015, 2006, 2015, 2015, 2010, 2006, 2006, 2010,
2000, 2006, 2015, 2006, 2015, 2015, 2000, 2015, 2000, 2015, 2015,
2010, 2015, 2015, 2015, 2000, 2006, 2006, 2006, 2015, 2015, 2006,
2015, 2010, 2000, 2000, 2010, 2006, 2010, 2010, 2015, 2000, 2015,
2006, 2000, 2006, 2015, 2006, 2000, 2010, 2010, 2010, 2015, 2006,
2015, 2000, 2015, 2010, 2010, 2010, 2010, 2000, 2000, 2000, 2006,
2015, 2015, 2000, 2000, 2000, 2015, 2006, 2006, 2010, 2006, 2000,
2010, 2000, 2015, 2015, 2015, 2015, 2010, 2000, 2000, 2010, 2006,
2010, 2010, 2000, 2000, 2000), cogimp = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -100L), class = c("tbl_df",
"tbl", "data.frame"))

df_test <- df %>%
group_by(year) %>%
summarize(cogimp = sum(cogimp), n = n()) %>%
group_by(year, cogimp, n) %>%
do(fitYear = prop.test(.$cogimp, .$n, p = 0.5, conf.level = 0.95))

tidy(df_test, fitYear) %>%
select(year, cogimp, n, p.value)
#> # A tibble: 4 x 4
#> # Groups: year, cogimp, n [4]
#> year cogimp n p.value
#> <dbl> <dbl> <int> <dbl>
#> 1 2000 3 30 0.0000268
#> 2 2006 2 19 0.00132
#> 3 2010 8 20 0.502
#> 4 2015 3 31 0.0000163

创建于 2020-04-06 由 reprex package (v0.3.0)

关于r - Purrr(或扫帚)用于计算分组数据集的比例测试(多比例测试),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61063396/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com