2017-08-01 71 views
0

如何通过变量进行分组并汇总使用ddply?组并通过函数内部的变量进行汇总

例如:

library(plyr) 

sample <- function(x, g){ 
    print(g) 
    print(x[[g]]) 
    res = ddply(x, ~x[[g]], summarise, value = mean(value)) 
    return(res) 
} 

x = data.frame(type = c('a', 'a', 'a', 'b'), 
       age = c(20, 21, 21, 10), 
       value = c(100, 120, 121, 150)) 
sample(x = x, g = 'age') 

将失败说:

Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, : 
    object 'g' not found 

即使该函数打印:

[1] "age" 
[1] 20 21 21 10 

为什么[R找到g,当涉及到印刷,但不当涉及到group_by ing?

编辑: 我期望的输出是:

x[["age"]] value 
1   10 150.0 
2   20 100.0 
3   21 120.5 

回答

0

这是一个使用dplyr包的解决方案。 为了正确评估group_by函数,我需要使用将被弃用的group_by_

library(dplyr) 

x = data.frame(type = c('a', 'a', 'a', 'b'), 
       age = c(20, 21, 21, 10), 
       value = c(100, 120, 121, 150)) 

sample <- function(x, g){ 
    print(g) 
    print(x[[g]]) 
    res<- group_by_(x, g) %>% summarise(mean(value)) 
    #res = ddply(x, ~x[[g]], summarise, value = mean(value)) 
    return(res) 
} 

sample(x = x, g = 'age') 
0

是通过“=”尝试调用你的函数这样

sample(x = x, g <- 'age') 

,或者你可以使用简单的

设置好的环境
# g insted of ~x[[g]] 
res = ddply(x, g, summarise, value = mean(value)) 
0

我会使用了最新版本dplyr来到tidyeval:

sample <- function(x, g){ 
var <- dplyr::enquo(g) 
res = x %>% group_by(!!var) %>% summarise(age_mean = mean(value)) 
return(res) 
} 

x = data.frame(type = c('a', 'a', 'a', 'b'), 
      age = c(20, 21, 21, 10), 
      value = c(100, 120, 121, 150)) 
sample(x, age) 

# A tibble: 3 x 2 
    age age_mean 
    <dbl> <dbl> 
1 10 150.0 
2 20 100.0 
3 21 120.5