2017-09-13 49 views
0

我正在尝试编写一个函数,该函数可以通过在数据框中跨多个因素进行分组来生成描述性统计信息。我花了太多时间试图让功能识别我选择的变量。在R中编写一个函数,以便根据数据帧中的变量列进行分组

这里是假的数据:

grouping1 <- c("red", "blue", "blue", "green", "red", "blue", "red", "green")     
grouping2 <- c("high", "high", "low", "medium", "low", "high", "medium", "high")     
value <- c(22,40,72,41,36,16,88,99) 

fake_df <- data.frame(grouping1, grouping2, value) 

假的代码示例:

library(dplyr) 

by_group_fun <- function(fun.data.in, fun.grouping.factor){ 
    fake_df2 <- fun.data.in %>% 
    group_by(fun.grouping.factor) %>% 
    summarize(mean = mean(value), median = median(value)) 
    fake_df2 
} 
by_group_fun(fake_df, grouping1) 
by_group_fun(fake_df, grouping2) 

这给了我:

Error in grouped_df_impl(data, unname(vars), drop) : 
    Column `fun.grouping.factor` is unknown 

第二次尝试

我试着将函数中选择的变量赋值给一个新的变量并进行转发。

假的代码示例(第二次尝试):

by_group_fun2 <- function(fun.data.in, fun.grouping.factor){ 
    fun.data.in$by_var <- fun.data.in$fun.grouping.factor 

    fake_df2 <- fun.data.in %>% 
    group_by(by_var) %>% 
    summarize(mean = mean(value), median = median(value)) 
    fake_df2 
} 

by_group_fun2(fake_df, grouping1) 
by_group_fun2(fake_df, grouping2) 

此,第二次尝试,给了我:

Error in grouped_df_impl(data, unname(vars), drop) : 
    Column `by_var` is unknown 
+1

看到这个学习如何用'dplyr'编程:HTTPS: //cran.r-project.org/web/packages/dplyr/vignettes/programming.html – www

回答

0

用这个例子来指导你

myfun <- function(df, thesecols) { 
       require(dplyr) 
       thesecols <- enquo(thesecols) # need to quote 
       df %>% 
       group_by_at(vars(!!thesecols)) # !! unquotes 
     } 

myfun(fake_df, grouping1) 

输出

# A tibble: 8 x 3 
# Groups: grouping1 [3] 
    grouping1 grouping2 value 
    <fctr> <fctr> <dbl> 
1  red  high 22 
2  blue  high 40 
3  blue  low 72 
4  green medium 41 
5  red  low 36 
6  blue  high 16 
7  red medium 88 
8  green  high 99 
2

一个非常简单的方式来获得相同的输出,而不诉诸与dplyr编程是收集分组列以长形式。双方分组产生的键和值列将得到所有你要求不动,超越单一data.frame组合:

library(tidyverse) 

fake_df <- data_frame(grouping1 = c("red", "blue", "blue", "green", "red", "blue", "red", "green"), 
         grouping2 = c("high", "high", "low", "medium", "low", "high", "medium", "high"), 
         value = c(22,40,72,41,36,16,88,99)) 

fake_df %>% 
    gather(group_var, group_val, -value) %>% 
    group_by(group_var, group_val) %>% 
    summarise(mean = mean(value), 
       median = median(value)) 
#> # A tibble: 6 x 4 
#> # Groups: group_var [?] 
#> group_var group_val  mean median 
#>  <chr>  <chr> <dbl> <dbl> 
#> 1 grouping1  blue 42.66667 40.0 
#> 2 grouping1  green 70.00000 70.0 
#> 3 grouping1  red 48.66667 36.0 
#> 4 grouping2  high 44.25000 31.0 
#> 5 grouping2  low 54.00000 54.0 
#> 6 grouping2 medium 64.50000 64.5 
相关问题