r - tidyverse 中的双重嵌套-6ren

r - tidyverse 中的双重嵌套

转载作者：行者123 更新时间：2023-12-02 15:34:39

26

4

使用examples从 Wickhams 对 R for data science 的 purrr 的介绍中，我正在尝试创建一个双重嵌套列表。

library(gapminder)
library(purrr)
library(tidyr)
gapminder
nest_data <- gapminder %>% group_by(continent) %>% nest(.key = by_continent)

如何进一步嵌套国家/地区，以便 Nest_data 包含 by_Continue 和新级别的嵌套 by_contry，最终包含 tibble by_year？

此外，在为 gapminder 数据创建此数据结构后 - 您将如何运行 bookchapter 中的回归模型示例每个国家/地区？

最佳答案

我的解决方案，下面有一些解释。

library(gapminder)
library(purrr)
library(tidyr)
library(broom)

nest_data <- gapminder %>% group_by(continent) %>% nest(.key = by_continent)

第一个问题是:如何将 by_country 嵌套在嵌套的 by_Continental 中

@aosmith 在评论中提供了很好的解决方案

nested_again<-
nest_data %>% mutate(by_continent = map(by_continent, ~.x %>% 
                                          group_by(country) %>% 
                                          nest(.key = by_country)))
# Level 1
nested_again
# # A tibble: 5 × 2
# continent      by_continent
# <fctr>            <list>
#   1      Asia <tibble [33 × 2]>
#   2    Europe <tibble [30 × 2]>
#   3    Africa <tibble [52 × 2]>
#   4  Americas <tibble [25 × 2]>
#   5   Oceania  <tibble [2 × 2]>

# Level 2
nested_again %>% unnest %>% slice(1:2)
# # A tibble: 2 × 3
# continent     country        by_country
# <fctr>      <fctr>            <list>
#   1      Asia Afghanistan <tibble [12 × 4]>
#   2      Asia     Bahrain <tibble [12 × 4]>

第二个问题:如何在更深层次上应用回归模型(我想将模型保存在标题上)

@aosmith 的解决方案(我称之为 sol1)

sol1<-mutate(nested_again, models = map(by_continent, "by_country") %>%
         at_depth(2, ~lm(lifeExp ~ year, data = .x)))

sol1
# # A tibble: 5 × 3
# continent      by_continent      models
# <fctr>            <list>      <list>
#   1      Asia <tibble [33 × 2]> <list [33]>
#   2    Europe <tibble [30 × 2]> <list [30]>
#   3    Africa <tibble [52 × 2]> <list [52]>
#   4  Americas <tibble [25 × 2]> <list [25]>
#   5   Oceania  <tibble [2 × 2]>  <list [2]>

sol1 %>% unnest(models)
# Error: Each column must either be a list of vectors or a list of data frames [models]
sol1 %>% unnest(by_continent) %>% slice(1:2)
# # A tibble: 2 × 3
#   continent     country        by_country
#      <fctr>      <fctr>            <list>
# 1      Asia Afghanistan <tibble [12 × 4]>
# 2      Asia     Bahrain <tibble [12 × 4]>

解决方案正在做它应该做的事情，但是没有简单的方法可以按国家/地区进行过滤，因为该信息嵌套在第 2 层中。

基于@aosmith对第一个问题的解决方案，我提出了解决方案2:

sol2<-nested_again %>% mutate(by_continent = map(by_continent, ~.x %>% 
                  mutate(models = map(by_country, ~lm(lifeExp ~ year, data = .x) )) ))
sol2
# # A tibble: 5 × 2
#   continent      by_continent
#      <fctr>            <list>
# 1      Asia <tibble [33 × 4]>
# 2    Europe <tibble [30 × 4]>
# 3    Africa <tibble [52 × 4]>
# 4  Americas <tibble [25 × 4]>
# 5   Oceania  <tibble [2 × 4]>

sol2 %>% unnest %>% slice(1:2)
# # A tibble: 2 × 4
#   continent     country        by_country   models
#      <fctr>      <fctr>            <list>   <list>
# 1      Asia Afghanistan <tibble [12 × 4]> <S3: lm>
# 2      Asia     Bahrain <tibble [12 × 4]> <S3: lm>

sol2 %>% unnest %>% unnest(by_country) %>% colnames
# [1] "continent" "country"   "year"      "lifeExp"   "pop"      
# [6] "gdpPercap"

# get model by specific country
sol2 %>% unnest %>% filter(country == "Brazil") %$% models %>% extract2(1)
# Call:
#   lm(formula = lifeExp ~ year, data = .x)
# 
# Coefficients:
#   (Intercept)         year  
# -709.9427       0.3901

# summary with broom::tidy
sol2 %>% unnest %>% filter(country == "Brazil") %$% models %>%
  extract2(1) %>% tidy
#          term     estimate    std.error statistic      p.value
# 1 (Intercept) -709.9426860 10.801042821 -65.72909 1.617791e-14
# 2        year    0.3900895  0.005456243  71.49417 6.990433e-15

我们可以整理所有模型并保存数据以用于绘图或过滤

sol2 %<>% mutate(by_continent = map(by_continent, ~.x %>% 
        mutate(tidymodels = map(models, tidy )) ))

sol2 %>% unnest %>% unnest(tidymodels) %>% 
  ggplot(aes(country,p.value,colour=continent))+geom_point()+
  facet_wrap(~continent)+
  theme(axis.text.x = element_blank())

selc <- sol2 %>% unnest %>% unnest(tidymodels) %>% filter(p.value > 0.05) %>% 
  select(country) %>% unique %>% extract2(1)

gapminder %>% filter(country %in% selc ) %>%
  ggplot(aes(year,lifeExp,colour=continent))+geom_line(aes(group=country))+
  facet_wrap(~continent)

aaaaand，我们可以使用模型

m1 <- sol2 %>% unnest %>% slice(1) %$% models %>% extract2(1)

x <- sol2 %>% unnest %>% slice(1) %>% unnest(by_country) %>% select(year)

pred1 <- data.frame(year = x, lifeExp = predict.lm(m1,x))

sol2 %>% unnest %>% slice(1) %>% unnest(by_country) %>%
  ggplot(aes(year, lifeExp )) + geom_point() +
  geom_line(data=pred1)

在这种情况下，确实没有充分的理由使用这种双重嵌套(当然，除了学习如何使用它)，但我在工作中发现了一个非常有值(value)的案例，特别是当您需要一个函数来工作时在第三级，按级别 1 和 2 分组，并保存在级别 2 - 当然，为此我们也可以在级别 1 上使用 for 循环，但这有什么乐趣;) 我'我不太确定这个“嵌套”map 与 for 循环 + map 相比如何执行，但我接下来会测试它。

基准

看起来差别不大

# comparison map_map with for_map
map_map<-function(nested_again){
nested_again %>% mutate(by_continent = map(by_continent, ~.x %>% 
  mutate(models = map(by_country, ~lm(lifeExp ~ year, data = .x) )) )) }

for_map<-function(nested_again){ for(i in 1:length(nested_again[[1]])){
  nested_again$by_continent[[i]] %<>%
  mutate(models = map(by_country, ~lm(lifeExp ~ year, data = .x) )) }}

res<-microbenchmark::microbenchmark(
  mm<-map_map(nested_again), fm<-for_map(nested_again) )

res
# Unit: milliseconds
#                         expr      min       lq     mean   median       uq      max neval cld
#  mm <- map_map(nested_again) 121.0033 144.5530 160.6785 155.2389 174.2915 240.2012   100   a
#  fm <- for_map(nested_again) 131.4312 148.3329 164.7097 157.6589 173.6480 455.7862   100   a

autoplot(res)

关于r - tidyverse 中的双重嵌套，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39228502/

26

4

0

文章推荐： SQL全连接无任何条件

文章推荐： c++ - Unique_ptr 将所有权移动到包含对象的方法

文章推荐： node.js - jenkins 管道的 docker 容器内的 sudo 权限

文章推荐： sql-server - SSRS 使用 FOR XML 自动为子查询列添加别名

r - 库错误(tidyverse): there is no package called ‘tidyverse’
我正在尝试安装 tidyverse 包以便在我的脚本中使用 gather 函数。每次我尝试安装软件包时都会收到以下消息: * installing *source* package ‘curl’ ..
Error: package or namespace load failed for ‘tidyverse’ in loadNamespace(错误：在loadNamesspace中为‘tidyVerse’加载包或命名空间失败)
当我加载tidyVerse时，我收到以下错误。几分钟前，当我运行我的shinyapp时，一切都很好。我该如何解决这个问题呢？
r - 无法安装 tidyverse
没有名为“tidyverse”的包是我在执行此操作后收到的错误消息: install.packages('tidyverse', dependencies = T); install.packages
r - tidyverse 中所有可能的对
我想在不重复的数据帧行之间创建所有可能的对(即 A_B 与 B_A 相同)。在 tidyverse 中是否有一种优雅的方式来做到这一点？示例数据: df df_pairs # A tibble:
r - tidyverse 按列连接两个具有动态列名的数据集
我想加入两个数据框，我需要将“by”列作为动态列传递。我试图在此处遵循此解决方案 ( How to pass column names for inner join by 2 column sets
r - Tidyverse:从关键字列表中匹配字符串中的单词
我正在尝试编写一些代码来检查字符串是否包含术语列表中包含的任何单词，以便在数据框中创建一个新列。这是术语列表: vehicles % mutate( asset_type = case_
r - tidyverse 计算每行跨多列的排名
我有以下数据框: dat % rowwise() %>% mutate(my_ranks = list(rank(c_across(starts_with("x"))))) 但是当我尝试取消嵌
R - (Tidyverse) 将多个观测值压缩为一个
我有一个包含多个变量的数据集，其中两个是日期(开始日期、结束日期)。有时日期间隔已被拆分为序列，例如，您将: 开始:1990-12-12，停止:1990-12-13开始:1990-12-13，停止:1
r - tidyverse:按组逐行计算
我正在尝试在 R 中进行库存计算，这需要对每个 Mat-Plant 组合进行逐行计算。这是一个测试数据集 - df 300K 行，所以希望用 tidyverse 做到这一点以获得更优雅和更快的方法。尝
r - tidyverse:将特定日期与事件期间匹配
我有我想与我只有开始日期的事件匹配的日期。作为一个简化的代表，假设我想弄清楚在某些事件中谁是总统，但我只有就职日期。 pres % left_join(pres, by = c("date
r - tidyverse 中的函数
我想创建具有中间函数的 tidyverse。我有一个结构 temp1 = sapply(df, function(x) .....) temp2 = sapply(temp1, function(x)
使用 tidyverse 重新定位行
是否可以relocate 行在 tidyverse框架就像可以用于带有 dplyr 的列一样relocate ? 在这个例子中，我想将第 1 行重新定位到位置 5(数据帧的结尾) 我的数据框: df
r - 非平等加入 tidyverse
我想知道是否有人知道 dplyr 扩展包( dbplyr 和 dtplyr )是否允许在通常的 dplyr 工作流程中进行非对等连接？我很少需要 data.table ，但快速非 equi 连接是我总
r - tidyverse:汇总时计算特定级别的数量
我想在分组后汇总时，计算另一个因素的特定级别的数量。在下面的工作示例中，我想计算每个组中 "male" 级别的数量。我已经尝试了很多计数、计数等方法，但找不到一种简单明了的方法来做到这一点。 df
r - 汇总必须分组的多个列 tidyverse
我有一个数据框，其中包含如下所示的数据: df % group_by(group1,group2,one) %>% summarise(n()).有什么方法可以汇总所有三列，然后将它们全部绑定(bin
R tidyverse 表演示
当涉及到输出表格时，我正在将统计分析脚本从 SPSS 转换为 R，尽管我不断遇到问题。我最近开始使用 tidyverse 包，因此理想情况下希望找到一个与之兼容的解决方案，但更一般地说，我希望能够针对
使用环境变量重命名带有 tidyverse 的列
我想以编程方式rename() 我的data 中的一些变量，这样我就可以在某个时候通过map 访问它。我正在寻找等同于， library(tidyverse) mtcars %>% rename(
r - tidyverse 中的双重嵌套
使用examples从 Wickhams 对 R for data science 的 purrr 的介绍中，我正在尝试创建一个双重嵌套列表。 library(gapminder) library(p
r tidyverse - 计算具有相同名称的多个列的平均值
我有一些每周收集的数据，其中的一个片段是这样的，通过 dput: p % gather(time,value,railroad, measure, category) %>%
r - tidyverse 汇总多列但将结果显示为行
我有数据，我想使用 tidyverse 方法获取多列的一堆汇总统计信息。但是，利用 tidyverse 的 summarize函数，它会将每个列统计信息创建为一个新列，而我更愿意将列名称视为行，将每个

首页

博学

6Ren·AI

商城