r - 为列表列数据框的每一行拟合不同的模型-6ren

r - 为列表列数据框的每一行拟合不同的模型

转载作者：行者123 更新时间：2023-12-04 12:14:28

25

4

使用 tidyverse 中的列表列数据结构拟合因数据框的行而异的不同模型公式的最佳方法是什么？

在 R for Data Science 中，Hadley 提供了一个极好的示例，说明如何使用列表列数据结构并轻松拟合许多模型 ( http://r4ds.had.co.nz/many-models.html#gapminder )。我试图找到一种方法来拟合许多公式略有不同的模型。在下面改编自他的原始示例的示例中，为每个大陆拟合不同模型的最佳方法是什么？

library(gapminder)
library(dplyr)
library(tidyr)
library(purrr)
library(broom)

by_continent <- gapminder %>% 
  group_by(continent) %>% 
  nest()

by_continent <- by_continent %>% 
  mutate(model = map(data, ~lm(lifeExp ~ year, data = .)))

by_continent %>% 
  mutate(glance=map(model, glance)) %>% 
  unnest(glance, .drop=T)

## A tibble: 5 × 12
#  continent r.squared adj.r.squared     sigma statistic      p.value    df
#     <fctr>     <dbl>         <dbl>     <dbl>     <dbl>        <dbl> <int>
#1      Asia 0.4356350     0.4342026 8.9244419  304.1298 6.922751e-51     2
#2    Europe 0.4984659     0.4970649 3.8530964  355.8099 1.344184e-55     2
#3    Africa 0.2987543     0.2976269 7.6685811  264.9929 6.780085e-50     2
#4  Americas 0.4626467     0.4608435 6.8618439  256.5699 4.354220e-42     2
#5   Oceania 0.9540678     0.9519800 0.8317499  456.9671 3.299327e-16     2
## ... with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>,
##   deviance <dbl>, df.residual <int>

我知道我可以通过迭代 by_continent 来做到这一点(效率不高，因为它估计每个大陆的每个模型:

formulae <- list(
  Asia=~lm(lifeExp ~ year, data = .),
  Europe=~lm(lifeExp ~ year + pop, data = .),
  Africa=~lm(lifeExp ~ year + gdpPercap, data = .),
  Americas=~lm(lifeExp ~ year - 1, data = .),
  Oceania=~lm(lifeExp ~ year + pop + gdpPercap, data = .)
)

for (i in 1:nrow(by_continent)) {
  by_continent$model[[i]] <- map(by_continent$data, formulae[[i]])[[i]]
}

by_continent %>% 
  mutate(glance=map(model, glance)) %>% 
  unnest(glance, .drop=T)

## A tibble: 5 × 12
#  continent r.squared adj.r.squared     sigma  statistic       p.value    df
#     <fctr>     <dbl>         <dbl>     <dbl>      <dbl>         <dbl> <int>
#1      Asia 0.4356350     0.4342026 8.9244419   304.1298  6.922751e-51     2
#2    Europe 0.4984677     0.4956580 3.8584819   177.4093  3.186760e-54     3
#3    Africa 0.4160797     0.4141991 7.0033542   221.2506  2.836552e-73     3
#4  Americas 0.9812082     0.9811453 8.9703814 15612.1901 4.227928e-260     1
#5   Oceania 0.9733268     0.9693258 0.6647653   243.2719  6.662577e-16     4
## ... with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>,
##   deviance <dbl>, df.residual <int>

但是是否有可能在不返回基础 R 中循环的情况下执行此操作(并避免拟合我不需要的模型)？

我试过的是这样的:

by_continent <- by_continent %>% 
left_join(tibble::enframe(formulae, name="continent", value="formula"))

by_continent %>% 
   mutate(model=map2(data, formula, est_model))

但我似乎无法想出一个有效的 est_model 函数。我尝试了这个不起作用的函数 (h/t: https://gist.github.com/multidis/8138757 ):

  est_model <- function(data, formula, ...) {
  mc <- match.call()
  m <- match(c("formula","data"), names(mc), 0L)
  mf <- mc[c(1L, m)]
  mf[[1L]] <- as.name("model.frame")
  mf <- eval(mf, parent.frame())
  data.st <- data.frame(mf)

  return(data.st)
}

(诚然，这是一个人为的例子。我的实际情况是我的数据中有大量的观察缺少关键的自变量，所以我想用完整的观察中的所有变量拟合一个模型，而另一个只拟合一个变量子集的模型休息观察。)

更新

我想出了一个有效的 est_model 函数(虽然可能效率不高):

est_model <- function(data, formula, ...) {
  map(list(data), formula, ...)[[1]]
}

by_continent <- by_continent %>% 
   mutate(model=map2(data, formula, est_model))

by_continent %>% 
  mutate(glance=map(model, glance)) %>% 
  unnest(glance, .drop=T)

## A tibble: 5 × 12
#  continent r.squared adj.r.squared     sigma  statistic       p.value    df
#      <chr>     <dbl>         <dbl>     <dbl>      <dbl>         <dbl> <int>
#1      Asia 0.4356350     0.4342026 8.9244419   304.1298  6.922751e-51     2
#2    Europe 0.4984677     0.4956580 3.8584819   177.4093  3.186760e-54     3
#3    Africa 0.4160797     0.4141991 7.0033542   221.2506  2.836552e-73     3
#4  Americas 0.9812082     0.9811453 8.9703814 15612.1901 4.227928e-260     1
#5   Oceania 0.9733268     0.9693258 0.6647653   243.2719  6.662577e-16     4
## ... with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>, deviance <dbl>,
##   df.residual <int>

最佳答案

我发现制作模型公式列表更容易。每个模型只适合一次对应的 continent .我添加了一个新列 formula到嵌套数据以确保 formula和 continent如果它们不是，它们的顺序相同。

formulae <- c(
    Asia= lifeExp ~ year,
    Europe= lifeExp ~ year + pop,
    Africa= lifeExp ~ year + gdpPercap,
    Americas= lifeExp ~ year - 1,
    Oceania= lifeExp ~ year + pop + gdpPercap
)

df <- gapminder %>%
    group_by(continent) %>%
    nest() %>%
    mutate(formula = formulae[as.character(continent)]) %>%
    mutate(model = map2(formula, data, ~ lm(.x, .y))) %>%
    mutate(glance=map(model, glance)) %>%
    unnest(glance, .drop=T)

# # A tibble: 5 × 12
#   continent r.squared adj.r.squared     sigma  statistic       p.value    df      logLik        AIC        BIC
#      <fctr>     <dbl>         <dbl>     <dbl>      <dbl>         <dbl> <int>       <dbl>      <dbl>      <dbl>
# 1      Asia 0.4356350     0.4342026 8.9244419   304.1298  6.922751e-51     2 -1427.65947 2861.31893 2873.26317
# 2    Europe 0.4984677     0.4956580 3.8584819   177.4093  3.186760e-54     3  -995.41016 1998.82033 2014.36475
# 3    Africa 0.4160797     0.4141991 7.0033542   221.2506  2.836552e-73     3 -2098.46089 4204.92179 4222.66639
# 4  Americas 0.9812082     0.9811453 8.9703814 15612.1901 4.227928e-260     1 -1083.35918 2170.71836 2178.12593
# 5   Oceania 0.9733268     0.9693258 0.6647653   243.2719  6.662577e-16     4   -22.06696   54.13392   60.02419
# # ... with 2 more variables: deviance <dbl>, df.residual <int>

关于r - 为列表列数据框的每一行拟合不同的模型，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41404198/

25

4

0

文章推荐： animation - matplotlib FuncAnimation 清晰地绘制每个重复周期

文章推荐： css - 如何覆盖 ng2 pdf 查看器 css

文章推荐： r - 如何从 R 计算 PDF 中的页数？

mysql - 如何按 ASC 顺序获取 MySql 不同(不同)值
我有 table 像这样 -------------------------------------------- id size title priority
java - 不同 Activity 中的 AdMob 广告单元 ID 不同？提高匹配率？
我的应用在不同的 Activity (4 个 Activity )中仅包含横幅广告。所以我的疑问是，我可以对所有横幅广告使用一个广告单元 ID 吗？或者每个 Activity 使用不同的广告单元
没有isinstance的列表列表上的python递归(不同)
我有任意(但统一)数字列表的任意列表。 (它们是 n 空间中 bin 的边界坐标，我想绘制其角，但这并不重要。)我想生成所有可能组合的列表。所以:[[1,2], [3,4],[5,6]] 产生 [[1
Java自定义控件重绘导致绘制不正确(不同)
我刚刚在学校开始学习 Java，正在尝试自定义控件和图形。我目前正在研究图案锁，一开始一切都很好，但突然间它绘制不正确。我确实更改了一些代码，但是当我看到错误时，我立即将其更改回来(撤消，ftw)，但
sql - 分组依据汇总和计数(不同)
在获取 Distinct 的 Count 时，我在使用 Group By With Rollup 时遇到了一个小问题。问题是 Rollup 摘要只是所有分组中 Distinct 值的总数，而不是所有
sql - 如何对多列进行计数(不同)
这不起作用: select count(distinct colA, colB) from mytable 我知道我可以通过双选来简单地解决这个问题。 select count(*) from (
javascript - 为什么在比较时与 ""不同
这个问题在这里已经有了答案: JavaScript regex whitespace characters (5 个回答) 2年前关闭。你能解释一下为什么我会得到 false比较 text ===
javascript - [] 与 [] 不同
这个问题已经有答案了: 奥 git _a (56 个回答) 已关闭 9 年前。我被要求用 Javascript 编写一个函数 sortByFoo 来正确响应此测试: // Does not cras
sql - 在按单个列上的多个值进行内部联接查询过滤时选择“不同”？
所以，我不得不说，SQL 是迄今为止我作为开发人员最薄弱的一面。也许我想要完成的事情很简单。我有这样的东西(这不是真正的模型，但为了使其易于理解而不浪费太多时间解释它，我想出了一个完全模仿我必须使用的
javascript - 为什么在通过引用传递后调用函数时对象内部的 "this"不同？
这个问题在这里已经有了答案: How does the "this" keyword work? (22 个回答) 3年前关闭。简而言之:为什么在使用 Objects 时，直接调用的函数和通过引用传
C++ 不同 -> 和 "."
这个问题在这里已经有了答案: 关闭 12 年前。 Possible Duplicate: what is the difference between (.) dot operator and (-
c++ - for 循环给出的结果与 += 不同
我真的不明白这里发生了什么但是: 当我这样做时: colorIndex += len - stopPos; for(int m = 0; m < len - stopPos; m++) { c
MySQL 按顺序和计数分组(不同)
思考 MySQL 中的 Group By 函数的最佳方式是什么？我正在编写一个 MySQL 查询，通过 ODBC 连接在 Excel 的数据透视表中提取数据，以便用户可以轻松访问数据。例如，我有:
mysql - 如何在组内选择具有条件的计数(不同)
我想要的SQL是这样的: SELECT week_no, type, SELECT count(distinct user_id) FROM group WHERE pts > 0 FROM bas
php - 不同/连接两个表
商店表: +--+-------+--------+ |id|name |date | +--+-------+--------+ |1 |x |Ma
javascript - offsetParent 不同
对于 chrome 和 ff，当涉及到可怕的 ie 时，这个脚本工作完美。有问题 function getY(oElement) { var curtop = 0; if (oElem
c - 不同.c文件之间的IPC进程间通信
我现在无法提供代码，因为我目前正在脑海中研究这个想法并在互联网上四处乱逛。我了解了进程间通信和使用共享内存在进程之间共享数据(特别是结构)。但是，在对保存在不同 .c 文件中的程序使用 fork(
c - C编程中的MongoDB聚合函数(不同)
我想在用户集合中使用不同的功能。在 mongo shell 中，我可以像下面这样使用: db.users.distinct("name"); 其中名称是用于区分的集合字段。同样我想要，在 C
c# - linq选择问题(不同)
List nastava_izvjestaj = new List(); var data_context = new DataEvidencijaDataContext();
生产中的 CSS 不同
我的 Rails 应用程序中有 Ransack 搜索和 Foundation，本地 css 渲染正常，而生产中的同一个应用程序有一个怪癖: 应用程序中的其他内容完全相同。我在 Chrome 和 Sa

首页

博学

6Ren·AI

商城

r - 为列表列数据框的每一行拟合不同的模型