gpt4 book ai didi

r - 在数据框中嵌套多组列

转载 作者:行者123 更新时间:2023-12-04 10:32:08 25 4
gpt4 key购买 nike

将多列嵌套到单个列表列中的概念非常强大。但是,我不确定是否可以使用 nest 将一组以上的列嵌套到同一管道内的多个列表列中。函数在 {tidyr} .例如,假设我有以下数据框:

df <- as.data.frame(replicate(6, runif(10) * 100))

colnames(df) <- c(
paste0("a", 1:2), # a1, a2
paste0("b", 1:4) # b1, b2, b3, b4
)

df
a1 a2 b1 b2 b3 b4
1 20.807348 69.339482 91.837151 99.76813 3.394350 33.780049
2 64.667733 20.676381 80.523369 38.42774 85.635208 60.111491
3 55.352501 55.699571 4.812923 38.65333 98.869203 80.345576
4 45.194094 16.511696 83.834651 51.48698 7.191081 16.697210
5 66.401642 89.041055 26.965636 67.90061 90.622428 59.552935
6 35.750100 55.997766 49.768556 68.45900 67.523080 58.993232
7 21.392823 5.335281 56.348328 35.68331 51.029617 66.290035
8 8.851236 19.486580 14.199370 22.49754 14.617592 18.236406
9 70.475652 6.229997 43.169364 12.63378 21.415589 2.163004
10 47.837613 37.641530 38.001288 71.15896 71.000568 2.135611

我想将“a”列嵌套到列表列中,并将“b”列嵌套到第二个列表列中,因为我想对它们执行不同的计算。

嵌套“a”列的工作原理:
library(tidyr)
nest(df, a1, a2, .key = "a")

b1 b2 b3 b4 a
1 91.837151 99.76813 3.394350 33.780049 20.80735, 69.33948
2 80.523369 38.42774 85.635208 60.111491 64.66773, 20.67638
3 4.812923 38.65333 98.869203 80.345576 55.35250, 55.69957
4 83.834651 51.48698 7.191081 16.697210 45.19409, 16.51170
5 26.965636 67.90061 90.622428 59.552935 66.40164, 89.04105
6 49.768556 68.45900 67.523080 58.993232 35.75010, 55.99777
7 56.348328 35.68331 51.029617 66.290035 21.392823, 5.335281
8 14.199370 22.49754 14.617592 18.236406 8.851236, 19.486580
9 43.169364 12.63378 21.415589 2.163004 70.475652, 6.229997
10 38.001288 71.15896 71.000568 2.135611 47.83761, 37.64153

但是在嵌套“a”列之后嵌套“b”列是不可能的:
nest(df, a1, a2, .key = "a") %>%
nest(b1, b2, b3, b4, .key = "b")
Error in grouped_df_impl(data, unname(vars), drop) :
Column `a` can't be used as a grouping variable because it's a list

通过阅读错误消息,这是有道理的。

我的解决方法是:
  • 嵌套“a”列
  • 对“a”列表列执行所需的计算
  • 取消嵌套“a”列表列
  • 嵌套“b”列
  • 对“b”列表列执行所需的计算
  • 取消嵌套“b”列表列

  • 有没有更直接的方法来实现这一目标?非常感谢您的帮助。

    最佳答案

    我们可以使用 map去做这个

    library(tidyverse)
    out <- list('a', 'b') %>%
    map(~ df %>%
    select(matches(.x)) %>%
    nest(names(.), .key = !! rlang::sym(.x))) %>%
    bind_cols
    out
    # A tibble: 1 x 2
    # a b
    # <list> <list>
    #1 <data.frame [10 × 2]> <data.frame [10 × 4]>


    out %>%
    unnest
    # A tibble: 10 x 6
    # a1 a2 b1 b2 b3 b4
    # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    # 1 20.8 69.3 91.8 99.8 3.39 33.8
    # 2 64.7 20.7 80.5 38.4 85.6 60.1
    # 3 55.4 55.7 4.81 38.7 98.9 80.3
    # 4 45.2 16.5 83.8 51.5 7.19 16.7
    # 5 66.4 89.0 27.0 67.9 90.6 59.6
    # 6 35.8 56.0 49.8 68.5 67.5 59.0
    # 7 21.4 5.34 56.3 35.7 51.0 66.3
    # 8 8.85 19.5 14.2 22.5 14.6 18.2
    # 9 70.5 6.23 43.2 12.6 21.4 2.16
    #10 47.8 37.6 38.0 71.2 71.0 2.14

    我们可以对列的 'a' 和 'b' 列表进行单独的计算
    out %>%
    mutate(a = map(a, `*`, 4)) %>%
    unnest
    # A tibble: 10 x 6
    # a1 a2 b1 b2 b3 b4
    # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    # 1 83.2 277. 91.8 99.8 3.39 33.8
    # 2 259. 82.7 80.5 38.4 85.6 60.1
    # 3 221. 223. 4.81 38.7 98.9 80.3
    # 4 181. 66.0 83.8 51.5 7.19 16.7
    # 5 266. 356. 27.0 67.9 90.6 59.6
    # 6 143. 224. 49.8 68.5 67.5 59.0
    # 7 85.6 21.3 56.3 35.7 51.0 66.3
    # 8 35.4 77.9 14.2 22.5 14.6 18.2
    # 9 282. 24.9 43.2 12.6 21.4 2.16
    #10 191. 151. 38.0 71.2 71.0 2.14

    话虽如此,也可以使用 mutate_at 选择感兴趣的列。而不是做 nest/unnest
    df %>% 
    mutate_at(vars(matches('^a\\d+')), funs(.*4))

    关于r - 在数据框中嵌套多组列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53344067/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com