2016-05-21 25 views
3

有这样Tidyr如何传播到发生数

other=data.frame(name=c("a","b","a","c","d"),result=c("Y","N","Y","Y","N")) 

如何使用扩展函数在tidyr或其他功能的数据帧得到的结果是或否的数作为列标题这样

name  Y N 
a   2 0 
b   0 1 

感谢

+2

寻找'表(其他)'? – mtoto

回答

11

这些都是很多的几种方法去做:

1)随着库dplyr,你可以简单的东西分类和计数到所需要的格式:

library(dplyr) 
other %>% group_by(name) %>% summarise(N = sum(result == 'N'), Y = sum(result == 'Y')) 
Source: local data frame [4 x 3] 

    name  N  Y 
    <fctr> <int> <int> 
1  a  0  2 
2  b  1  0 
3  c  0  1 
4  d  1  0 

2)可以使用的tabletidyr传播组合如下:

library(tidyr) 
spread(as.data.frame(table(other)), result, Freq) 
    name N Y 
1 a 0 2 
2 b 1 0 
3 c 0 1 
4 d 1 0 

3)你可以使用的dplyrtidyr组合做如下:

library(dplyr) 
library(tidyr) 
spread(count(other, name, result), result, n, fill = 0) 
Source: local data frame [4 x 3] 
Groups: name [4] 

    name  N  Y 
    <fctr> <dbl> <dbl> 
1  a  0  2 
2  b  1  0 
3  c  0  1 
4  d  1  0 
5

这里是另一个选择离子使用dcastdata.table

library(data.table) 
dcast(setDT(other), name~result, length) 
# name N Y 
#1: a 0 2 
#2: b 1 0 
#3: c 0 1 
#4: d 1 0 

虽然table(other)将是一个紧凑的选项(从@ mtoto的评论),为大型数据集,它可能是更有效地使用dcast。一些基准下面给出

set.seed(24) 
other1 <- data.frame(name = sample(letters, 1e6, replace=TRUE), 
    result = sample(c("Y", "N"), 1e6, replace=TRUE), stringsAsFactors=FALSE) 

other2 <- copy(other1) 

gopala1 <- function() other1 %>% 
          group_by(name) %>% 
          summarise(N = sum(result == 'N'), Y = sum(result == 'Y')) 
gopala2 <- function() spread(as.data.frame(table(other1)), result, Freq) 
gopala3 <- function() spread(count(other1, name, result), result, n, fill = 0) 
akrun <- function() dcast(as.data.table(other2), name~result, length) 


library(microbenchmark) 
microbenchmark(gopala1(), gopala2(), gopala3(), 
        akrun(), unit='relative', times = 20L) 
#  expr  min  lq  mean median  uq  max neval 
# gopala1() 2.710561 2.331915 2.142183 2.325167 2.134399 1.513725 20 
# gopala2() 2.859464 2.564126 2.531130 2.683804 2.720833 1.982760 20 
# gopala3() 2.345062 2.076400 1.953136 2.027599 1.882079 1.947759 20 
# akrun() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 20