gpt4 book ai didi

r - 收集多组列

转载 作者:行者123 更新时间:2023-12-03 04:50:47 25 4
gpt4 key购买 nike

我从一项在线调查中获得了数据,其中受访者循环提问 1-3 次。调查软件 (Qualtrics) 将这些数据记录在多个列中,即调查中的 Q3.2 将包含列 Q3.2.1.Q3.2.2.Q3.2.3.:

df <- data.frame(
id = 1:10,
time = as.Date('2009-01-01') + 0:9,
Q3.2.1. = rnorm(10, 0, 1),
Q3.2.2. = rnorm(10, 0, 1),
Q3.2.3. = rnorm(10, 0, 1),
Q3.3.1. = rnorm(10, 0, 1),
Q3.3.2. = rnorm(10, 0, 1),
Q3.3.3. = rnorm(10, 0, 1)
)

# Sample data

id time Q3.2.1. Q3.2.2. Q3.2.3. Q3.3.1. Q3.3.2. Q3.3.3.
1 1 2009-01-01 -0.2059165 -0.29177677 -0.7107192 1.52718069 -0.4484351 -1.21550600
2 2 2009-01-02 -0.1981136 -1.19813815 1.1750200 -0.40380049 -1.8376094 1.03588482
3 3 2009-01-03 0.3514795 -0.27425539 1.1171712 -1.02641801 -2.0646661 -0.35353058
...

我想将所有 QN.N* 列组合成整齐的单独 QN.N 列,最终得到如下结果:

   id       time loop_number        Q3.2        Q3.3
1 1 2009-01-01 1 -0.20591649 1.52718069
2 2 2009-01-02 1 -0.19811357 -0.40380049
3 3 2009-01-03 1 0.35147949 -1.02641801
...
11 1 2009-01-01 2 -0.29177677 -0.4484351
12 2 2009-01-02 2 -1.19813815 -1.8376094
13 3 2009-01-03 2 -0.27425539 -2.0646661
...
21 1 2009-01-01 3 -0.71071921 -1.21550600
22 2 2009-01-02 3 1.17501999 1.03588482
23 3 2009-01-03 3 1.11717121 -0.35353058
...

tidyr 库具有 gather() 函数,该函数非常适合组合一组列:

library(dplyr)
library(tidyr)
library(stringr)

df %>% gather(loop_number, Q3.2, starts_with("Q3.2")) %>%
mutate(loop_number = str_sub(loop_number,-2,-2)) %>%
select(id, time, loop_number, Q3.2)


id time loop_number Q3.2
1 1 2009-01-01 1 -0.20591649
2 2 2009-01-02 1 -0.19811357
3 3 2009-01-03 1 0.35147949
...
29 9 2009-01-09 3 -0.58581232
30 10 2009-01-10 3 -2.33393981

结果数据框有 30 行,正如预期的那样(10 个人,每人 3 个循环)。但是,收集第二组列无法正常工作 - 它成功地合并了两个列 Q3.2Q3.3,但最终得到 90 行,而不是30(10个人、Q3.2 3个循环、Q3.3 3个循环的所有组合;实际数据中每组列的组合会大幅增加):

df %>% gather(loop_number, Q3.2, starts_with("Q3.2")) %>% 
gather(loop_number, Q3.3, starts_with("Q3.3")) %>%
mutate(loop_number = str_sub(loop_number,-2,-2))


id time loop_number Q3.2 Q3.3
1 1 2009-01-01 1 -0.20591649 1.52718069
2 2 2009-01-02 1 -0.19811357 -0.40380049
3 3 2009-01-03 1 0.35147949 -1.02641801
...
89 9 2009-01-09 3 -0.58581232 -0.13187024
90 10 2009-01-10 3 -2.33393981 -0.48502131

有没有办法像这样使用多次调用 gather() ,组合这样的列的小子集,同时保持正确的行数?

最佳答案

这种方法对我来说似乎很自然:

df %>%
gather(key, value, -id, -time) %>%
extract(key, c("question", "loop_number"), "(Q.\\..)\\.(.)") %>%
spread(question, value)

首先收集所有问题列,使用extract()分成questionloop_number,然后spread() code> 问题回到列中。

#>    id       time loop_number         Q3.2        Q3.3
#> 1 1 2009-01-01 1 0.142259203 -0.35842736
#> 2 1 2009-01-01 2 0.061034802 0.79354061
#> 3 1 2009-01-01 3 -0.525686204 -0.67456611
#> 4 2 2009-01-02 1 -1.044461185 -1.19662936
#> 5 2 2009-01-02 2 0.393808163 0.42384717

关于r - 收集多组列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25925556/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com