2016-06-28 80 views
0

我想按dataframe2列的顺序重新排列dataframe1根据另一个文件中的第一列对文件中的数据进行排序

Dataframe1

 N02_M N05_F N06_M N07_F N08_F N09_M N02_M N026_F N03_M 
2237895 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 
586  0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 
2255280 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 
7294280 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 
7499 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803 
35209 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 

这里是数据帧2:

Dataframe2

586 
2237895 
7499 
35209 
2255280 
7294280 

结果:

 N02_M N05_F N06_M N07_F N08_F N09_M N02_M N026_F N03_M 
586  0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 
2237895 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 
7499 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803 
35209 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 
2255280 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 
7294280 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 

我试着命令awk的

'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' df2 df1 

,但它不工作,它不会改变的df2

+0

你的问题有点含糊。可以解释一点点?在结果中,“586”如何放置在第一行? – 989

+0

结果中的顺序对我来说看起来是随机的 – Tung

+2

我不明白你是如何得到结果的......结果显示了同样的6行,但顺序似乎不是来自第2个数据帧...后者只有2个值出现在第一个表格中(586和35209)。 –

回答

1
$ cat tst.awk 
NR==FNR { 
    if (NR==1) { 
     print 
    } 
    else { 
     map[$1] = $0 
    } 
    next 
} 
{ print map[$1] } 

$ awk -f tst.awk dataframe1 dataframe2 
     N02_M N05_F N06_M N07_F N08_F N09_M N02_M N026_F N03_M 
586  0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 
2237895 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 0.6225 
7499 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803 0.803 
35209 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 0.94 
2255280 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 0.995 
7294280 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 
0

顺序下面是用dplyr

library(dplyr) 
df1$id <- rownames(df1) # create common column "id" based on rownames 

names(df2) <- "id" #change column name of df2 to common name "id" 
df2$id <- as.character(df2$id) #ensure that the structure for df1$id and df2$id are in similar structure 


left_join(df2, df1) 

     id N02_M N05_F N06_M N07_F N08_F N09_M N02_M.1 N026_F N03_M 
1 34387  NA  NA  NA  NA  NA  NA  NA  NA  NA 
2 202783  NA  NA  NA  NA  NA  NA  NA  NA  NA 
3 887320  NA  NA  NA  NA  NA  NA  NA  NA  NA 
4 35444  NA  NA  NA  NA  NA  NA  NA  NA  NA 
5 88035  NA  NA  NA  NA  NA  NA  NA  NA  NA 
6 239862  NA  NA  NA  NA  NA  NA  NA  NA  NA 
7 9266359  NA  NA  NA  NA  NA  NA  NA  NA  NA 
8  586 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 0.8364 
9 375535  NA  NA  NA  NA  NA  NA  NA  NA  NA 
10 3794260  NA  NA  NA  NA  NA  NA  NA  NA  NA 
11 6687758  NA  NA  NA  NA  NA  NA  NA  NA  NA 
12 6983267  NA  NA  NA  NA  NA  NA  NA  NA  NA 
13 704346  NA  NA  NA  NA  NA  NA  NA  NA  NA 
14 35209 0.9400 0.9400 0.9400 0.9400 0.9400 0.9400 0.9400 0.9400 0.9400 
15 795668  NA  NA  NA  NA  NA  NA  NA  NA  NA 

解决顺便说一句,df1 == Dataframe1和df2 == Dataframe2

0

让我们准备好数据。

Dataframe1 <- cbind(a=rnorm(6), b=rnorm(6)) 
rownames(Dataframe1) <- round(rnorm(6)*1000000) 
Dataframe2 <- cbind(sample(as.integer(rownames(Dataframe1)))) 

Dataframe1:

    a   b 
255728 -0.2063562 -0.804514251 
215067 0.2756317 -0.865951064 
2079729 1.1376277 0.001123908 
-898713 0.1846505 -0.908527352 
-717953 1.7424731 0.462088593 
-1028794 0.4162211 1.548455286 

Dataframe2:

  [,1] 
[1,] -1028794 
[2,] -898713 
[3,] -717953 
[4,] 215067 
[5,] 2079729 
[6,] 25572 

而现在......戏法!

Dataframe1[order(rownames(Dataframe1))[rank(Dataframe2)],] 
        a   b 
-1028794 0.4162211 1.548455286 
2079729 1.1376277 0.001123908 
215067 0.2756317 -0.865951064 
255728 -0.2063562 -0.804514251 
-898713 0.1846505 -0.908527352 
-717953 1.7424731 0.462088593 

这种排序的想法是使用order(x)[rank(y)]

相关问题