作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在尝试为我的 tibble 中的每个参与者生成一个平均分数表。观察者的数量比下面给出的数据要多得多,但是这个tibble应该足够了。我需要为每个唯一的 user_id 生成一个表。我希望表格有 10 行,其中 8 行是每个时间点指标 1-8 的均值,另外两个是每个时间点的域均值。域0的均值是指标1-4的均值,域1的均值是指标5-8的均值。我还希望输出的表有四列,每个时间点一列。因此,每个 teacher_id 的输出表应该是 10 x 4。我已经用 tidyverse 尝试过这个,希望得到帮助。此外,一些用户(阅读几个)不会在所有时间点都有值。
structure(list(Group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1), user_id = c("Kim", "Kim",
"Kim", "Kim", "Kim", "Kim", "Kim",
"Kim", "Bob", "Bob", "Bob", "Bob",
"Bob", "Bob", "Bob", "Bob", "Bob",
"Bob", "Bob", "Bob", "Bob", "Bob",
"Bob", "Bob", "George", "George", "George", "George",
"George", "George", "George", "George", "George", "George", "George",
"George", "George", "George", "George", "George"), indicator = c("1",
"2", "3", "4", "5", "6", "7", "8", "1", "1", "2", "2", "3", "3",
"4", "4", "5", "5", "6", "6", "7", "7", "8", "8", "1", "1", "2",
"2", "3", "3", "4", "4", "5", "5", "6", "6", "7", "7", "8", "8"
), Timepoint = c(1, 1, 1, 1, 1, 1, 1, 1, 3, 4, 3, 4, 3,
4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4,
3, 4, 3, 4, 3, 4), score = c(3.5, 3.5, 2, 3, 3.5, 4,
3, 4, 2, 3, 2.5, 3, 1.5, 1.5, 0.5, 3, 2, 4, 2.5, 4, 2.5, 3.5,
3, 3.5, 3.5, 3, 2.5, 2.5, 2.5, 2, 2, 3, 3.5, 3.5, 3.5, 3.5, 3,
3, 3, 2.5)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-40L))
尝试的 tidyverse 代码:
user_tables <- d %>%
group_by(user_id,indicator,Timepoint) %>%
summarise(Time1 = mean[which(indicator == 1 & Timepoint == 1)], mean[which(indicator == 2 & Timepoint == 1)], mean[which(indicator == 3 & Timepoint == 1)], mean[which(indicator == 4 & Timepoint == 1)], mean[which(indicator == 5 & Timepoint == 1)], mean[which(indicator == 6 & Timepoint == 1)], mean[which(indicator == 7 & Timepoint == 1)], mean[which(indicator == 8 & Timepoint == 1)],
Time2 = mean[which(indicator == 1 & Timepoint == 2)], mean[which(indicator == 2 & Timepoint == 2)], mean[which(indicator == 3 & Timepoint == 2)], mean[which(indicator == 4 & Timepoint == 2)], mean[which(indicator == 5 & Timepoint == 2)], mean[which(indicator == 6 & Timepoint == 2)], mean[which(indicator == 7 & Timepoint == 2)], mean[which(indicator == 8 & Timepoint == 2)],
Time3 = mean[which(indicator == 1 & Timepoint == 3)], mean[which(indicator == 2 & Timepoint == 3)], mean[which(indicator == 3 & Timepoint == 3)], mean[which(indicator == 4 & Timepoint == 3)], mean[which(indicator == 5 & Timepoint == 3)], mean[which(indicator == 6 & Timepoint == 3)], mean[which(indicator == 7 & Timepoint == 3)], mean[which(indicator == 8 & Timepoint == 3)],
Time4 = mean[which(indicator == 1 & Timepoint == 4)], mean[which(indicator == 2 & Timepoint == 4)], mean[which(indicator == 3 & Timepoint == 4)], mean[which(indicator == 4 & Timepoint == 4)], mean[which(indicator == 5 & Timepoint == 4)], mean[which(indicator == 6 & Timepoint == 4)], mean[which(indicator == 7 & Timepoint == 4)], mean[which(indicator == 8 & Timepoint == 4)]) %>%
split(., .$user_id)
最终,我希望每个用户都有一个这样的表(其中 NA 是合适的方法)(注意:这是给 Bob 的——他没有时间 1 或时间 2 的分数):
structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 1.625, 2, 2.5, 1.5, 0.5, 2.5, 2,
2.5, 2.5, 3, 2.625, 3, 3, 1.5, 3, 3.75, 4, 4, 3.5, 3.5), .Dim = c(10L,
4L), .Dimnames = list(c("Domain 0", "Ind 1", "Ind 2", "Ind 3",
"Ind 4", "Domain 1", "Ind 5", "Ind 6", "Ind 7", "Ind 8"), c("Time 1",
"Time 2", "Time 3", "Time 4")))
谢谢!
最佳答案
由于您要添加行,因此您可以:
df %>%
group_by(Group, user_id, Timepoint, domain = +(indicator>4), indicator) %>%
summarise(sc=mean(score),.groups ='drop_last') %>%
pivot_wider(c(Group, user_id, indicator, domain), Timepoint,'Time_', values_from = sc) %>%
group_nest()%>%
mutate(data = map(data,
~rbind(c(NA,colMeans(select_if(.x,is.numeric), na.rm = TRUE)),.x)))%>%
unnest(data)%>%
mutate(indicator = ifelse(is.na(indicator),
paste0('Domain ', domain), paste0('Ind ', indicator)),
domain = NULL)
A tibble: 30 x 6
Group user_id indicator Time_3 Time_4 Time_1
<dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 1 Bob Domain 0 1.62 2.62 NaN
2 1 Bob Ind 1 2 3 NA
3 1 Bob Ind 2 2.5 3 NA
4 1 Bob Ind 3 1.5 1.5 NA
5 1 Bob Ind 4 0.5 3 NA
6 1 Bob Domain 1 2.5 3.75 NaN
7 1 Bob Ind 5 2 4 NA
8 1 Bob Ind 6 2.5 4 NA
9 1 Bob Ind 7 2.5 3.5 NA
10 1 Bob Ind 8 3 3.5 NA
# ... with 20 more rows
关于r - 使用 Tidyverse 为每个用户/参与者输出一系列汇总统计表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64799132/
两个表 - posts 和 users posts.user 列与 users.id 匹配。 我想统计所有用户的所有帖子,这样: user1 5 user2 3 user3 9 ...等等,这意味着
我是Python新手(使用Python3.6),我学习它主要是为了能够为此页面构建一个抓取工具 http://www.nhl.com/stats/player?aggregate=0&gameType
我是一名优秀的程序员,十分优秀!