2015-01-26 148 views
2

我想使用dplyr进行某些数据操作。背景:我有一个调查权重和一堆变量(大部分是likert-items)。我想总结每个类别的频率和百分比,有无调查权重。在函数中使用dplyr的问题(group_by)

作为一个例子,让我们使用性别变量的频率。结果应该是这样的:

gender freq freq.weighted 
    1  292  922.2906 
    2  279  964.7551 
    9   6  21.7338 

我会为许多变量做到这一点。所以,我决定把dplyr-code放在一个函数中,所以我只需要改变这个变量并输入less。

#exampledata 
gender<-c("2","2","1","2","2","2","2","2","2","2","2","2","1","1","2","2","2","2","2","2","1","2","2","2","2","2","2","2","2","2") 
survey_weight<-c("2.368456","2.642901","2.926698","3.628653","3.247463","3.698195","2.776772","2.972387","2.686365","2.441820","3.494899","3.133106","3.253514","3.138839","3.430597","3.769577","3.367952","2.265350","2.686365","3.189538","3.029999","3.024567","2.972387","2.730978","4.074495","2.921552","3.769577","2.730978","3.247463","3.230097") 
test_dataframe<-data.frame(gender,survey_weight) 

#function 
weighting.function<-function(dataframe,variable){ 
    test_weighted<- dataframe %>% 
    group_by_(variable) %>% 
    summarise_(interp(freq=count(~weight)), 
       interp(freq_weighted=sum(~weight))) 
    return(test_weighted) 
} 

result_dataframe<-weighting.function(test_dataframe,"gender") 

#this second step was left out in this example: 
#mutate_(perc=interp(~freq/sum(~freq)*100),perc_weighted=interp(~freq_weighted/sum(~freq_weighted)*100)) 

这将导致以下错误-消息:

Error in UseMethod("group_by_") : 
    no applicable method for 'group_by_' applied to an object of class "formula" 

我已经尝试了很多不同的东西。首先,我使用freq=n()来计算频率,但我总是得到一个错误(我检查,plyr在dplyr之前加载,而不是之后 - 它也没有工作。)。

任何想法?我读了关于标准评估的小插曲。但是,我总是遇到问题,不知道什么是解决方案。

回答

12

我想你有几个嵌套的错误,这会导致你的问题。最大的一个是使用count()而不是summarise()。我猜你想n()

weighting.function <- function(dataframe, variable){ 
    dataframe %>% 
    group_by_(variable) %>% 
    summarise_(
     freq = ~n(), 
     freq_weighted = ~sum(survey_weight) 
    ) 
} 

weighting.function(test_dataframe, ~gender) 

你也有过的interp()一些不必要的用途。如果您确实使用interp(),则呼叫应该看起来像freq = interp(~n()),即该名称不在调用interp的名称中,并且正在插补的内容以~开头。