2016-05-14 83 views
1

我一直试图制作一个显示因子水平的计数表的另一个因素。为此,我查看了几十页,问题......试图在某些软件包(dplyr,reshape)中使用函数来完成工作,但没有正确使用它们的任何成功。聚合因子水平计数 - 按因子

这就是我的了:

# my data: 
var1 <- c("red","blue","red","blue","red","red","red","red","red","red","red","red","blue","red","blue") 
var2 <- c("0","1","0","0","0","0","0","0","0","0","1","0","0","0","0") 
var3 <- c("2","2","1","1","1","3","1","2","1","1","3","1","1","2","1") 
var4 <- c("0","1","0","0","0","0","1","0","1","1","0","1","0","1","1") 
mydata <- data.frame(var1,var2,var3,var4) 
head(mydata) 

尝试N + 1:仅显示通过另一个因素因素的总次数。

t(aggregate(. ~ var1, mydata, sum)) 

     [,1] [,2] 
var1 "blue" "red" 
var2 " 5" "12" 
var3 " 5" "18" 
var4 " 6" "16" 

尝试n + 2:这是正确的格式,但我不能让它在多个因素上工作。

library(dplyr) 
data1 <- ddply(mydata, c("var1", "var3"), summarise, 
      N = length(var1)) 
library(reshape) 
df1 <- cast(data1, var1 ~ var3, sum) 
df1 <- t(df1) 
df1 

    blue red 
1 3 6 
2 1 3 
3 0 2 

我想的是:

 blue red 
var2.0 3 10 
var2.1 1 1 
var3.1 3 6 
var3.2 1 3 
var3.3 0 2 
var4.0 2 6 
var4.1 2 5 

我怎样才能得到这种格式?因此,许多在此先感谢,

+1

是的,编辑。谢谢! – Mareviv

回答

3

我们可以通过“VAR1” melt数据集,然后使用table

library(reshape2) 
tbl <- table(transform(melt(mydata, id.var="var1"), 
     varN = paste(variable, value, sep="."))[c(4,1)]) 
names(dimnames(tbl)) <- NULL 
tbl 
# 
#   blue red 
# var2.0 3 10 
# var2.1 1 1 
# var3.1 3 6 
# var3.2 1 3 
# var3.3 0 2 
# var4.0 2 6 
# var4.1 2 5 

或者用dplyr/tidyr,我们把数据集从“宽”到“长”格式与gather,然后unite创建'varV'的列('var','val'),在'var1'和'varV'分组后得到频率(tally),然后spread为'宽'格式。

library(dplyr) 
library(tidyr) 
gather(mydata, var, val, -var1) %>% 
      unite(varV,var, val, sep=".") %>% 
      group_by(var1, varV) %>% 
      tally() %>% 
      spread(var1, n, fill = 0) 
# varV blue red 
# <chr> <dbl> <dbl> 
#1 var2.0  3 10 
#2 var2.1  1  1 
#3 var3.1  3  6 
#4 var3.2  1  3 
#5 var3.3  0  2 
#6 var4.0  2  6 
#7 var4.1  2  5 
+0

绝对的辉煌!特别是,第二种方法正是我所需要的。我需要花时间了解如何使用管道填充数据框架。我真的需要学习为自己做这个,非常感谢你的解释! – Mareviv