2011-10-11 80 views
7

我遇到了一个应用程序,我需要按列号对data.frame进行排序,并且usual solutions似乎都不允许这样做。在列表中订购

上下文创建as.data.frame.by方法。由于by对象的最后一列将作为值列,并且第一个ncol-1列将作为索引列。 melt返回向后排序 - 索引3,然后索引2,然后索引1.为了与latex.table.by兼容,我想对它进行排序。但是我很难以一种通用的方式来做这件事。以下功能中的注释行是我迄今为止的最佳尝试。

as.data.frame.by <- function(x, colnames=paste("IDX",seq(length(dim(x))),sep=""), ...) { 
    num.by.vars <- length(dim(x)) 
    res <- melt(unclass(x)) 
    res <- na.omit(res) 
    colnames(res)[seq(num.by.vars)] <- colnames 
    #res <- res[ order(res[ , seq(num.by.vars)]) , ] # Sort the results by the by vars in the heirarchy given 
    res 
} 

dat <- transform(ChickWeight, Time=cut(Time,3), Chick=cut(as.numeric(Chick),3)) 
my.by <- by(dat, with(dat,list(Time,Chick,Diet)), function(x) sum(x$weight)) 
> as.data.frame(my.by) 
      IDX1   IDX2 IDX3 value 
1 (-0.021,6.99] (0.951,17.3] 1 3475 
2  (6.99,14] (0.951,17.3] 1 5969 
3  (14,21] (0.951,17.3] 1 8002 
4 (-0.021,6.99] (17.3,33.7] 1 640 
5  (6.99,14] (17.3,33.7] 1 1596 
6  (14,21] (17.3,33.7] 1 2900 
13 (-0.021,6.99] (17.3,33.7] 2 2253 
14  (6.99,14] (17.3,33.7] 2 4734 
15  (14,21] (17.3,33.7] 2 7727 
22 (-0.021,6.99] (17.3,33.7] 3 666 
23  (6.99,14] (17.3,33.7] 3 1391 
24  (14,21] (17.3,33.7] 3 2109 
25 (-0.021,6.99] (33.7,50] 3 1647 
26  (6.99,14] (33.7,50] 3 3853 
27  (14,21] (33.7,50] 3 7488 
34 (-0.021,6.99] (33.7,50] 4 2412 
35  (6.99,14] (33.7,50] 4 5448 
36  (14,21] (33.7,50] 4 8101 

随着线未注释的,则返回乱码(它只是把整个data.frame作为载体,具有灾难性的结果)。

我甚至试过巧妙的东西,如res <- res[ order(...=list(res[,1],res[,2])) , ]但无济于事。

我怀疑有一个简单的方法来做到这一点,但我没有看到它。

编辑澄清:我不想指定列名称。相反,我希望能够通过数值向量对它进行排序(例如按列1:4排序)。

回答

7
mydf <- as.data.frame(my.by) 
mydf[order(mydf$IDX3, mydf$IDX2, mydf$IDX1) , ] 
      IDX1   IDX2 IDX3 value 
1 (-0.021,6.99] (0.951,17.3] 1 3475 
3  (14,21] (0.951,17.3] 1 8002 
2  (6.99,14] (0.951,17.3] 1 5969 
4 (-0.021,6.99] (17.3,33.7] 1 640 
6  (14,21] (17.3,33.7] 1 2900 
5  (6.99,14] (17.3,33.7] 1 1596 
13 (-0.021,6.99] (17.3,33.7] 2 2253 
15  (14,21] (17.3,33.7] 2 7727 
14  (6.99,14] (17.3,33.7] 2 4734 
22 (-0.021,6.99] (17.3,33.7] 3 666 
24  (14,21] (17.3,33.7] 3 2109 
23  (6.99,14] (17.3,33.7] 3 1391 
25 (-0.021,6.99] (33.7,50] 3 1647 
27  (14,21] (33.7,50] 3 7488 
26  (6.99,14] (33.7,50] 3 3853 
34 (-0.021,6.99] (33.7,50] 4 2412 
36  (14,21] (33.7,50] 4 8101 
35  (6.99,14] (33.7,50] 4 5448 

或;

my.by <- by(dat, with(dat,list(Diet,Chick, Time)), function(x) sum(x$weight)) 
mydf <- as.data.frame(my.by) 

编辑:

mydf <- as.data.frame(my.by) 
mydf[ do.call(order, mydf[, 3:1]) , ] 
+0

对不起本来应该更清楚:我想不必指定列名或使用该数值列指数产生相同的输出往上顶。相反,我希望能够通过数值向量对它进行排序(例如按列1:4排序)。 –

+0

见上文。在'help(order)'页面上说明了将数据框传递给'order'的do.call方法。 –

+0

不错。谢谢。我需要更仔细地研究'​​do.call',因为我怀疑它会解决我的许多问题:-) –