2015-09-10 20 views
2

下面,我有一个我的数据框的简化版本,它实际上有更多的行和列。R:按组的变量的百分比交互

df <- data.frame(category=c("con","con","con","con","con","con", 
"tre","tre","tre","tre","tre","tre"), 
answer=c(1,0,1,0,0,0,1,0,0,1,1,1), 
female=c(1,1,0,0,0,0,1,1,1,0,0,0), 
married=c(1,1,1,0,0,0,0,1,1,0,0,0)) 

我需要克里特R中一个新的数据帧,其

  • 由变量“类别”分组,并且
  • 显示了因变量“回答”的百分比下的每个自变量。

而且,下面是我尝试创建的数据框。

needed <- data.frame(category=c("con", "tre"), 
female=c(50, 33.33),  
married=c(66.66, 0)) 

例如,它表明治疗组中的女性

  • 33.33%的人回答的问题。
  • 66.66%的cotrol组已婚人士回答了这个问题。

非常感谢您的帮助。

回答

5

这里是一个可能的dplyr实现,这将在一次

library(dplyr) 
df %>% 
    group_by(category) %>% 
    summarise_each(funs(sum(.[answer == 1])/sum(.)), -answer) 

# Source: local data frame [2 x 3] 
# 
# category female married 
#  (fctr)  (dbl)  (dbl) 
# 1  con 0.5000000 0.6666667 
# 2  tre 0.3333333 0.0000000 

你可以做类似的事情与data.table工作在你列,但你会得到一个额外的answer列过,结果

library(data.table) 
setDT(df)[, lapply(.SD, function(x) sum(x[answer == 1])/sum(x)), by = category] 
# category answer female married 
# 1:  con  1 0.5000000 0.6666667 
# 2:  tre  1 0.3333333 0.0000000 

问题#495现在用this recent commit解决,我们现在可以做到这一点就好了:

require(data.table) # v1.9.7+ 
setDT(df)[, lapply(.SD, function(x) sum(x[answer==1])/sum(x)), by=category, .SDcols=-"answer"] 
# category female married 
# 1:  con 0.5000000 0.6666667 
# 2:  tre 0.3333333 0.0000000 
2

添加必需的碱基-R的想法:

rowsum((df$answer & df[c("female", "married")]) + 0L, df$category)/
rowsum(df[c("female", "married")], df$category) 
#  female married 
#con 0.5000000 0.6666667 
#tre 0.3333333 0.0000000 
1

另一种选择是用splitcolSums。我们split数据集'category'得到list输出。我们可以循环使用sapply并获得colSums的列的子集和相应的行,答案是1除以'con','tre'输出的colSums

t(sapply(split(df, df$category), function(x) 
      100*with(x, colSums(x[answer==1,3:4])/colSums(x[3:4])))) 
#  female married 
#con 50.00000 66.66667 
#tre 33.33333 0.00000