gpt4 book ai didi

r - 如何使用 map* 和 mutate 将列表转换为一组附加列?

转载 作者:行者123 更新时间:2023-12-03 23:49:14 25 4
gpt4 key购买 nike

我已经尝试了数百种这种代码的排列方式,以尝试获得一个可以执行我想要的功能的函数,但我最终放弃了。感觉它绝对应该是可行的,而且我非常接近!

我试图用下面的 reprex 回到这里的核心。

基本上我有一个单行数据框,其中一列包含字符串列表(“概念”)。我想使用 mutate 为这些字符串中的每一个创建一个附加列,理想情况下,列的名称取自字符串,然后用函数调用的结果填充该列(?哪个函数无关紧要,现在? - 我只需要功能的基础设施才能工作。)

我觉得,像往常一样,我一定遗漏了一些明显的东西……也许只是一个语法错误。
我也想知道我是否需要使用 purrr::map ,也许更简单的矢量化映射会工作得很好。

我觉得新列被命名为 ..1 而不是概念名称这一事实有点暗示出了什么问题。

我可以通过手动调用每个概念来创建我想要的数据框(参见 reprex 的结尾),但由于不同数据框的概念列表不同,我想使用管道和 tidyverse 技术来实现它,而不是手动完成。

我已阅读以下问题以寻求帮助:

  • How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs
  • How to mutate multiple columns with dynamic variable using purrr:map function?
  • (R) Cleaner way to use map() with list-columns
  • Add multiple output variables using purrr and a predefined function
  • Creating new variables with purrr (how does one go about that?)
  • How to compute multiple new columns in a R dataframe with dynamic names

  • 但这些都没有帮助我解决我遇到的问题。 [编辑:在最后一个 q 中添加到该列表中,这可能是我需要的技术]。
    <!-- language-all: lang-r -->


    # load packages -----------------------------------------------------------

    library(rlang)
    library(dplyr)
    library(tidyr)
    library(magrittr)
    library(purrr)
    library(nomisr)



    # set up initial list of tibbles ------------------------------------------

    df <- list(
    district_population = tibble(
    dataset_title = "Population estimates - local authority based by single year",
    dataset_id = "NM_2002_1"
    ),
    jsa_claimants = tibble(
    dataset_title = "Jobseeker\'s Allowance with rates and proportions",
    dataset_id = "NM_1_1"
    )
    )


    # just use the first tibble for now, for testing --------------------------
    # ideally I want to map across dfs through a list -------------------------

    df <- df[[1]]

    # nitty gritty functions --------------------------------------------------

    get_concept_list <- function(df) {
    dataset_id <- pluck(df, "dataset_id")
    nomis_overview(id = dataset_id,
    select = c("dimensions", "codes")) %>%
    pluck("value", 1, "dimension") %>%
    filter(!concept == "geography") %>%
    pull("concept")
    }

    # get_concept_list() returns the strings I need:
    get_concept_list(df)
    #> [1] "time" "gender" "c_age" "measures"

    # Here is a list of examples of types of map* that do various things,
    # none of which is what I need it to do
    # I'm using toupper() here for simplicity - ultimately I will use
    # get_concept_info() to populate the new columns

    # this creates four new tibbles
    get_concept_list(df) %>%
    map(~ mutate(df, {{.x}} := toupper(.x)))
    #> [[1]]
    #> # A tibble: 1 x 3
    #> dataset_title dataset_id ..1
    #> <chr> <chr> <chr>
    #> 1 Population estimates - local authority based by single year NM_2002_1 TIME
    #>
    #> [[2]]
    #> # A tibble: 1 x 3
    #> dataset_title dataset_id ..1
    #> <chr> <chr> <chr>
    #> 1 Population estimates - local authority based by single year NM_2002_1 GENDER
    #>
    #> [[3]]
    #> # A tibble: 1 x 3
    #> dataset_title dataset_id ..1
    #> <chr> <chr> <chr>
    #> 1 Population estimates - local authority based by single year NM_2002_1 C_AGE
    #>
    #> [[4]]
    #> # A tibble: 1 x 3
    #> dataset_title dataset_id ..1
    #> <chr> <chr> <chr>
    #> 1 Population estimates - local authority based by single year NM_2002_1 MEASUR~

    # this throws an error
    get_concept_list(df) %>%
    map_chr(~ mutate(df, {{.x}} := toupper(.x)))
    #> Error: Result 1 must be a single string, not a vector of class `tbl_df/tbl/data.frame` and of length 3

    # this creates three extra rows in the tibble
    get_concept_list(df) %>%
    map_df(~ mutate(df, {{.x}} := toupper(.x)))
    #> # A tibble: 4 x 3
    #> dataset_title dataset_id ..1
    #> <chr> <chr> <chr>
    #> 1 Population estimates - local authority based by single year NM_2002_1 TIME
    #> 2 Population estimates - local authority based by single year NM_2002_1 GENDER
    #> 3 Population estimates - local authority based by single year NM_2002_1 C_AGE
    #> 4 Population estimates - local authority based by single year NM_2002_1 MEASUR~

    # this does the same as map_df
    get_concept_list(df) %>%
    map_dfr(~ mutate(df, {{.x}} := toupper(.x)))
    #> # A tibble: 4 x 3
    #> dataset_title dataset_id ..1
    #> <chr> <chr> <chr>
    #> 1 Population estimates - local authority based by single year NM_2002_1 TIME
    #> 2 Population estimates - local authority based by single year NM_2002_1 GENDER
    #> 3 Population estimates - local authority based by single year NM_2002_1 C_AGE
    #> 4 Population estimates - local authority based by single year NM_2002_1 MEASUR~

    # this creates a single tibble 12 columns wide
    get_concept_list(df) %>%
    map_dfc(~ mutate(df, {{.x}} := toupper(.x)))
    #> # A tibble: 1 x 12
    #> dataset_title dataset_id ..1 dataset_title1 dataset_id1 ..11 dataset_title2
    #> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
    #> 1 Population e~ NM_2002_1 TIME Population es~ NM_2002_1 GEND~ Population es~
    #> # ... with 5 more variables: dataset_id2 <chr>, ..12 <chr>,
    #> # dataset_title3 <chr>, dataset_id3 <chr>, ..13 <chr>

    # function to get info on each concept (except geography) -----------------
    # this is the function I want to use eventually to populate my new columns

    get_concept_info <- function(df, concept_name) {
    dataset_id <- pluck(df, "dataset_id")
    nomis_overview(id = dataset_id) %>%
    filter(name == "dimensions") %>%
    pluck("value", 1, "dimension") %>%
    filter(concept == concept_name) %>%
    pluck("codes.code", 1) %>%
    select(name, value) %>%
    nest(data = everything()) %>%
    as.list() %>%
    pluck("data")
    }


    # individual mutate works, for comparison ---------------------------------
    # I can create the kind of table I want manually using a line like the one below

    # df %>% map(~ mutate(., measures = get_concept_info(., concept_name = "measures")))
    df %>% mutate(., measures = get_concept_info(df, "measures"))
    #> # A tibble: 1 x 3
    #> dataset_title dataset_id measures
    #> <chr> <chr> <list>
    #> 1 Population estimates - local authority based by sin~ NM_2002_1 <tibble [2 x ~

    <sup>Created on 2020-02-10 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>

    最佳答案

    使用 !!:= 可以让您动态命名列。然后,我们可以使用 map() 减少 reduce() 的列表输出,它使用数据集标题和 id 列 left_joins() 列表中的所有数据帧。

    df_2 <- 
    map(get_concept_list(df),
    ~ mutate(df,
    !!.x := get_concept_info(df, .x))) %>%
    reduce(left_join, by = c("dataset_title", "dataset_id"))

    df_2

    # A tibble: 1 x 6
    dataset_title dataset_id time gender c_age measures
    <chr> <chr> <list<df[,2]>> <list<df[,2]>> <list<df[,2]>> <list<df[,2]>>
    1 Population estimates - local authority based by single year NM_2002_1 [28 x 2] [3 x 2] [121 x 2] [2 x 2]

    关于r - 如何使用 map* 和 mutate 将列表转换为一组附加列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60155799/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com