2014-10-03 86 views
0

如何通过由长字符串组成的列对R数据进行排序?下面的例子说明我的问题:使用长字符串按列排序R数据帧

> a = matrix(NA, nrow=4, ncol=3) 
> a[,1] = c(1,2,3,4) 
> a[,2] = c("gene001_10M","gene002_10M","gene001_50M","gene002_50M") 
> colnames(a) = c("value","sortkey","other") 
> a = as.data.frame(a) 
> a 
    value  sortkey other 
1  1 gene001_10M <NA> 
2  2 gene002_10M <NA> 
3  3 gene001_50M <NA> 
4  4 gene002_50M <NA> 

当我排序的“A”,现在,则SORTKEY似乎是从右至左读,留下“A”不变:

> b = a[sort(a$sortkey),] 
> b 
    value  sortkey other 
1  1 gene001_10M <NA> 
2  2 gene002_10M <NA> 
3  3 gene001_50M <NA> 
4  4 gene002_50M <NA> 

我的目标,然而,就是:

> b 
    value  sortkey other 
1  1 gene001_10M <NA> 
3  3 gene001_50M <NA> 
2  2 gene002_10M <NA> 
4  4 gene002_50M <NA> 

回答

0

当你有numbersalphabets等倒不如使用mixedordergtools,但在这里它的工作原理与order单独

a[order(as.character(a$sortkey)),] 
    # value  sortkey other 
    #1  1 gene001_10M <NA> 
    #3  3 gene001_50M <NA> 
    #2  2 gene002_10M <NA> 
    #4  4 gene002_50M <NA> 

此外,使用sort将让你的values代替index

sort(as.character(a$sortkey)) 
    #[1] "gene001_10M" "gene001_50M" "gene002_10M" "gene002_50M" 

或者否则,您必须指定index.return=TRUE这是默认FALSEsort

sort(as.character(a$sortkey), index.return=TRUE) 
    #$x 
    #[1] "gene001_10M" "gene001_50M" "gene002_10M" "gene002_50M" 

    #$ix 
    #[1] 1 3 2 4 

然后,使用

a[sort(as.character(a$sortkey), index.return=TRUE)$ix,] 
    # value  sortkey other 
    #1  1 gene001_10M <NA> 
    #3  3 gene001_50M <NA> 
    #2  2 gene002_10M <NA> 
    #4  4 gene002_50M <NA> 

此外,

library(gtools) 
    mixedorder(as.character(a$sortkey)) 
    #[1] 1 3 2 4 
+0

好极了!谢谢。 – 2014-10-03 11:18:59

0

您还可以使用ordergsub正则表达式预先除去字母

a[order(gsub("[a-zA-Z]+", "", a$sortkey)),] 
# value  sortkey other 
# 1  1 gene001_10M <NA> 
# 3  3 gene001_50M <NA> 
# 2  2 gene002_10M <NA> 
# 4  4 gene002_50M <NA>