2014-09-04 79 views
1

令人尴尬的基本问题,但是如果您不知道......我需要将count数据汇总数据的data.frame重新整形为汇总前的样子。这基本上与{plyr} count()的相反。在R中重新计算总计数据为长格式

> (d = data.frame(value=c(1,1,1,2,3,3), cat=c('A','A','A','A','B','B'))) 
    value cat 
1  1 A 
2  1 A 
3  1 A 
4  2 A 
5  3 B 
6  3 B 
> (summry = plyr::count(d)) 
    value cat freq 
1  1 A 3 
2  2 A 1 
3  3 B 2 

如果你开始summry什么是回d最快的方法是什么?除非我错了(很可能),{Reshape2}不会这样做。

回答

2

只需使用rep

summry[rep(rownames(summry), summry$freq), c("value", "cat")] 
#  value cat 
# 1  1 A 
# 1.1  1 A 
# 1.2  1 A 
# 2  2 A 
# 3  3 B 
# 3.1  3 B 

这种方法的变化可以从expandRowsmy "SOfun" package被发现。如果你已经加载了,你可以简单地做:

expandRows(summry, "freq") 
+0

我不知道'rep'接受一个向量,非常感谢! – geotheory 2014-09-04 15:35:30

+0

'SOfun'看起来非常有用btw – geotheory 2014-09-04 15:41:39

+0

+1对于SOFun参考! – Henk 2014-09-04 16:04:29

1

R cookbook website上有一个很好的数据框功能表,您可以稍微修改它。唯一的修改是改变'Freq' - >'freq'(与plyr::count一致),并确保rownames被重置为增加的整数。

expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".") { 
    # Take each row in the source data frame table and replicate it 
    # using the Freq value 
    DF <- sapply(1:nrow(x), 
       function(i) x[rep(i, each = x$freq[i]), ], 
       simplify = FALSE) 

    # Take the above list and rbind it to create a single DF 
    # Also subset the result to eliminate the Freq column 
    DF <- subset(do.call("rbind", DF), select = -freq) 

    # Now apply type.convert to the character coerced factor columns 
    # to facilitate data type selection for each column 
    for (i in 1:ncol(DF)) { 
    DF[[i]] <- type.convert(as.character(DF[[i]]), 
          na.strings = na.strings, 
          as.is = as.is, dec = dec) 
    } 
    row.names(DF) <- seq(nrow(DF)) 
    DF 
} 

expand.dft(summry) 

    value cat 
1  1 A 
2  1 A 
3  1 A 
4  2 A 
5  3 B 
6  3 B