gpt4 book ai didi

r - 使用 Tidyverse 为每个用户/参与者输出一系列汇总统计表

转载 作者:行者123 更新时间:2023-12-04 01:11:13 25 4
gpt4 key购买 nike

我正在尝试为我的 tibble 中的每个参与者生成一个平均分数表。观察者的数量比下面给出的数据要多得多,但是这个tibble应该足够了。我需要为每个唯一的 user_id 生成一个表。我希望表格有 10 行,其中 8 行是每个时间点指标 1-8 的均值,另外两个是每个时间点的域均值。域0的均值是指标1-4的均值,域1的均值是指标5-8的均值。我还希望输出的表有四列,每个时间点一列。因此,每个 teacher_id 的输出表应该是 10 x 4。我已经用 tidyverse 尝试过这个,希望得到帮助。此外,一些用户(阅读几个)不会在所有时间点都有值。

structure(list(Group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1), user_id = c("Kim", "Kim",
"Kim", "Kim", "Kim", "Kim", "Kim",
"Kim", "Bob", "Bob", "Bob", "Bob",
"Bob", "Bob", "Bob", "Bob", "Bob",
"Bob", "Bob", "Bob", "Bob", "Bob",
"Bob", "Bob", "George", "George", "George", "George",
"George", "George", "George", "George", "George", "George", "George",
"George", "George", "George", "George", "George"), indicator = c("1",
"2", "3", "4", "5", "6", "7", "8", "1", "1", "2", "2", "3", "3",
"4", "4", "5", "5", "6", "6", "7", "7", "8", "8", "1", "1", "2",
"2", "3", "3", "4", "4", "5", "5", "6", "6", "7", "7", "8", "8"
), Timepoint = c(1, 1, 1, 1, 1, 1, 1, 1, 3, 4, 3, 4, 3,
4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4,
3, 4, 3, 4, 3, 4), score = c(3.5, 3.5, 2, 3, 3.5, 4,
3, 4, 2, 3, 2.5, 3, 1.5, 1.5, 0.5, 3, 2, 4, 2.5, 4, 2.5, 3.5,
3, 3.5, 3.5, 3, 2.5, 2.5, 2.5, 2, 2, 3, 3.5, 3.5, 3.5, 3.5, 3,
3, 3, 2.5)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-40L))

尝试的 tidyverse 代码:

user_tables <- d %>% 
group_by(user_id,indicator,Timepoint) %>%
summarise(Time1 = mean[which(indicator == 1 & Timepoint == 1)], mean[which(indicator == 2 & Timepoint == 1)], mean[which(indicator == 3 & Timepoint == 1)], mean[which(indicator == 4 & Timepoint == 1)], mean[which(indicator == 5 & Timepoint == 1)], mean[which(indicator == 6 & Timepoint == 1)], mean[which(indicator == 7 & Timepoint == 1)], mean[which(indicator == 8 & Timepoint == 1)],
Time2 = mean[which(indicator == 1 & Timepoint == 2)], mean[which(indicator == 2 & Timepoint == 2)], mean[which(indicator == 3 & Timepoint == 2)], mean[which(indicator == 4 & Timepoint == 2)], mean[which(indicator == 5 & Timepoint == 2)], mean[which(indicator == 6 & Timepoint == 2)], mean[which(indicator == 7 & Timepoint == 2)], mean[which(indicator == 8 & Timepoint == 2)],
Time3 = mean[which(indicator == 1 & Timepoint == 3)], mean[which(indicator == 2 & Timepoint == 3)], mean[which(indicator == 3 & Timepoint == 3)], mean[which(indicator == 4 & Timepoint == 3)], mean[which(indicator == 5 & Timepoint == 3)], mean[which(indicator == 6 & Timepoint == 3)], mean[which(indicator == 7 & Timepoint == 3)], mean[which(indicator == 8 & Timepoint == 3)],
Time4 = mean[which(indicator == 1 & Timepoint == 4)], mean[which(indicator == 2 & Timepoint == 4)], mean[which(indicator == 3 & Timepoint == 4)], mean[which(indicator == 4 & Timepoint == 4)], mean[which(indicator == 5 & Timepoint == 4)], mean[which(indicator == 6 & Timepoint == 4)], mean[which(indicator == 7 & Timepoint == 4)], mean[which(indicator == 8 & Timepoint == 4)]) %>%
split(., .$user_id)

最终,我希望每个用户都有一个这样的表(其中 NA 是合适的方法)(注意:这是给 Bob 的——他没有时间 1 或时间 2 的分数):

structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, 1.625, 2, 2.5, 1.5, 0.5, 2.5, 2,
2.5, 2.5, 3, 2.625, 3, 3, 1.5, 3, 3.75, 4, 4, 3.5, 3.5), .Dim = c(10L,
4L), .Dimnames = list(c("Domain 0", "Ind 1", "Ind 2", "Ind 3",
"Ind 4", "Domain 1", "Ind 5", "Ind 6", "Ind 7", "Ind 8"), c("Time 1",
"Time 2", "Time 3", "Time 4")))

谢谢!

最佳答案

由于您要添加行,因此您可以:

df %>%
group_by(Group, user_id, Timepoint, domain = +(indicator>4), indicator) %>%
summarise(sc=mean(score),.groups ='drop_last') %>%
pivot_wider(c(Group, user_id, indicator, domain), Timepoint,'Time_', values_from = sc) %>%
group_nest()%>%
mutate(data = map(data,
~rbind(c(NA,colMeans(select_if(.x,is.numeric), na.rm = TRUE)),.x)))%>%
unnest(data)%>%
mutate(indicator = ifelse(is.na(indicator),
paste0('Domain ', domain), paste0('Ind ', indicator)),
domain = NULL)

A tibble: 30 x 6
Group user_id indicator Time_3 Time_4 Time_1
<dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 1 Bob Domain 0 1.62 2.62 NaN
2 1 Bob Ind 1 2 3 NA
3 1 Bob Ind 2 2.5 3 NA
4 1 Bob Ind 3 1.5 1.5 NA
5 1 Bob Ind 4 0.5 3 NA
6 1 Bob Domain 1 2.5 3.75 NaN
7 1 Bob Ind 5 2 4 NA
8 1 Bob Ind 6 2.5 4 NA
9 1 Bob Ind 7 2.5 3.5 NA
10 1 Bob Ind 8 3 3.5 NA
# ... with 20 more rows

关于r - 使用 Tidyverse 为每个用户/参与者输出一系列汇总统计表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64799132/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com