为列表数据框的每一行安装不同的模型

根据数据框的行与tidyverse中的list-columns数据结构的不同，适合不同模型公式的最佳方法是什么？为列表数据框的每一行安装不同的模型

在R for Data Science中，Hadley提供了一个很好的例子，说明如何使用列列数据结构并轻松地适应多个模型（http://r4ds.had.co.nz/many-models.html#gapminder）。我正试图找到一种方式来适应许多具有稍微不同公式的模型。在下面的例子中，从他原来的例子改编而来，在每个大陆适合不同模型的最佳方法是什么？

library(gapminder) 
library(dplyr) 
library(tidyr) 
library(purrr) 
library(broom) 

by_continent <- gapminder %>% 
    group_by(continent) %>% 
    nest() 

by_continent <- by_continent %>% 
    mutate(model = map(data, ~lm(lifeExp ~ year, data = .))) 

by_continent %>% 
    mutate(glance=map(model, glance)) %>% 
    unnest(glance, .drop=T) 

## A tibble: 5 × 12 
# continent r.squared adj.r.squared  sigma statistic  p.value df 
#  <fctr>  <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <int> 
#1  Asia 0.4356350  0.4342026 8.9244419 304.1298 6.922751e-51  2 
#2 Europe 0.4984659  0.4970649 3.8530964 355.8099 1.344184e-55  2 
#3 Africa 0.2987543  0.2976269 7.6685811 264.9929 6.780085e-50  2 
#4 Americas 0.4626467  0.4608435 6.8618439 256.5699 4.354220e-42  2 
#5 Oceania 0.9540678  0.9519800 0.8317499 456.9671 3.299327e-16  2 
## ... with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>, 
## deviance <dbl>, df.residual <int>

我知道，因为它估计各大洲每个模型我可以通过by_continent（效率不高，迭代做到这一点：

formulae <- list(
    Asia=~lm(lifeExp ~ year, data = .), 
    Europe=~lm(lifeExp ~ year + pop, data = .), 
    Africa=~lm(lifeExp ~ year + gdpPercap, data = .), 
    Americas=~lm(lifeExp ~ year - 1, data = .), 
    Oceania=~lm(lifeExp ~ year + pop + gdpPercap, data = .) 
) 

for (i in 1:nrow(by_continent)) { 
    by_continent$model[[i]] <- map(by_continent$data, formulae[[i]])[[i]] 
} 

by_continent %>% 
    mutate(glance=map(model, glance)) %>% 
    unnest(glance, .drop=T) 

## A tibble: 5 × 12 
# continent r.squared adj.r.squared  sigma statistic  p.value df 
#  <fctr>  <dbl>   <dbl>  <dbl>  <dbl>   <dbl> <int> 
#1  Asia 0.4356350  0.4342026 8.9244419 304.1298 6.922751e-51  2 
#2 Europe 0.4984677  0.4956580 3.8584819 177.4093 3.186760e-54  3 
#3 Africa 0.4160797  0.4141991 7.0033542 221.2506 2.836552e-73  3 
#4 Americas 0.9812082  0.9811453 8.9703814 15612.1901 4.227928e-260  1 
#5 Oceania 0.9733268  0.9693258 0.6647653 243.2719 6.662577e-16  4 
## ... with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>, 
## deviance <dbl>, df.residual <int>

但有可能做到这一点，而不在基础跟随回圈R（和避免装修款，我不需要）

我想什么是这样的：？

by_continent <- by_continent %>% 
left_join(tibble::enframe(formulae, name="continent", value="formula")) 

by_continent %>% 
    mutate(model=map2(data, formula, est_model))

但我似乎无法想出一个可行的est_model函数。我想这个功能（H/T：https://gist.github.com/multidis/8138757）不起作用。

est_model <- function(data, formula, ...) { 
    mc <- match.call() 
    m <- match(c("formula","data"), names(mc), 0L) 
    mf <- mc[c(1L, m)] 
    mf[[1L]] <- as.name("model.frame") 
    mf <- eval(mf, parent.frame()) 
    data.st <- data.frame(mf) 

    return(data.st) 
}

（诚然，这是一个人为的例子我的实际情况是，我有大量的观察遗漏在我的数据关键自变量，所以我想，以配合上完整的观察所有变量一个模型，另一个只在休息观测变量的一个子集。）

UPDATE

我想出了一个est_model功能的工作原理（尽管可能效率不高）：

est_model <- function(data, formula, ...) { 
    map(list(data), formula, ...)[[1]] 
} 

by_continent <- by_continent %>% 
    mutate(model=map2(data, formula, est_model)) 

by_continent %>% 
    mutate(glance=map(model, glance)) %>% 
    unnest(glance, .drop=T) 

## A tibble: 5 × 12 
# continent r.squared adj.r.squared  sigma statistic  p.value df 
#  <chr>  <dbl>   <dbl>  <dbl>  <dbl>   <dbl> <int> 
#1  Asia 0.4356350  0.4342026 8.9244419 304.1298 6.922751e-51  2 
#2 Europe 0.4984677  0.4956580 3.8584819 177.4093 3.186760e-54  3 
#3 Africa 0.4160797  0.4141991 7.0033542 221.2506 2.836552e-73  3 
#4 Americas 0.9812082  0.9811453 8.9703814 15612.1901 4.227928e-260  1 
#5 Oceania 0.9733268  0.9693258 0.6647653 243.2719 6.662577e-16  4 
## ... with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>, deviance <dbl>, 
## df.residual <int>

来源

2016-12-30 LmW.

不知道为什么:(我觉得我已经做了我的研究，任何暗示，评论，指向正确的方向认识 –

我觉得很容易制作一个模型公式列表。每个型号仅适用于相应的continent。我添加一个新的列formula到嵌套的数据，以确保formula和continent是否在相同的顺序，如果他们不是。

formulae <- c(
    Asia= lifeExp ~ year, 
    Europe= lifeExp ~ year + pop, 
    Africa= lifeExp ~ year + gdpPercap, 
    Americas= lifeExp ~ year - 1, 
    Oceania= lifeExp ~ year + pop + gdpPercap 
) 

df <- gapminder %>% 
    group_by(continent) %>% 
    nest() %>% 
    mutate(formula = formulae[as.character(continent)]) %>% 
    mutate(model = map2(formula, data, ~ lm(.x, .y))) %>% 
    mutate(glance=map(model, glance)) %>% 
    unnest(glance, .drop=T) 

# # A tibble: 5 × 12 
# continent r.squared adj.r.squared  sigma statistic  p.value df  logLik  AIC  BIC 
#  <fctr>  <dbl>   <dbl>  <dbl>  <dbl>   <dbl> <int>  <dbl>  <dbl>  <dbl> 
# 1  Asia 0.4356350  0.4342026 8.9244419 304.1298 6.922751e-51  2 -1427.65947 2861.31893 2873.26317 
# 2 Europe 0.4984677  0.4956580 3.8584819 177.4093 3.186760e-54  3 -995.41016 1998.82033 2014.36475 
# 3 Africa 0.4160797  0.4141991 7.0033542 221.2506 2.836552e-73  3 -2098.46089 4204.92179 4222.66639 
# 4 Americas 0.9812082  0.9811453 8.9703814 15612.1901 4.227928e-260  1 -1083.35918 2170.71836 2178.12593 
# 5 Oceania 0.9733268  0.9693258 0.6647653 243.2719 6.662577e-16  4 -22.06696 54.13392 60.02419 
# # ... with 2 more variables: deviance <dbl>, df.residual <int>

来源

2017-01-01 03:33:02 mt1022

这可能是我也可以用modelr :: formula（）创建公式，尽管我更喜欢用公式来保留建模函数（“lm”），所以它在summary（）中显示出来，来电。Thx！ –

我发现purrr::at_depth()是做什么，我想在原来的问题与est_model()做。这是解决我现在高兴：。这里的downvote

library(gapminder) 
library(tidyverse) 
library(purrr) 
library(broom) 

fmlas <- tibble::tribble(
    ~continent, ~formula, 
    "Asia", ~lm(lifeExp ~ year, data = .), 
    "Europe", ~lm(lifeExp ~ year + pop, data = .), 
    "Africa", ~lm(lifeExp ~ year + gdpPercap, data = .), 
    "Americas", ~lm(lifeExp ~ year - 1, data = .), 
    "Oceania", ~lm(lifeExp ~ year + pop + gdpPercap, data = .) 
) 

by_continent <- gapminder %>% 
    nest(-continent) %>% 
    left_join(fmlas) %>% 
    mutate(model=map2(data, formula, ~at_depth(.x, 0, .y))) 

by_continent %>% 
    mutate(glance=map(model, glance)) %>% 
    unnest(glance, .drop=T)

来源

2017-05-09 05:23:07

为列表数据框的每一行安装不同的模型

回答

相关问题