2016-04-25 144 views
1

我有这些不同尺寸的矩阵。所有矩阵中的key.related.sheet列都有一些常用值和一些唯一值。我想匹配那些常见的行并合并所有三个矩阵,但我也想包含独特的行。结果列仅应具有key.related.sheet,Sample_Btrace_1trace_2trace_3列。有人可以帮助我吗?如何合并多个不同尺寸的矩阵R

aa<-structure(c("S05-F13-P01:S05-F13-P01", "S05-F13-P01:S08-F10-P01", 
"S05-F13-P01:S08-F11-P01", "S05-F13-P01:S09-F66-P01", "S05-F13-P01", 
"S08-F10-P01", "S08-F11-P01", "S09-F66-P01", "1.25", "0.227", 
"-0.183", "-0.217"), .Dim = c(4L, 3L), .Dimnames = list(NULL, 
    c("key.related.sheet", "sample_B", "trace_1"))) 

bb<-structure(c("S05-F13-P01:S08-F10-P01", "S05-F13-P01:S08-F11-P01", 
"S05-F13-P01:S09-F66-P01", "S05-F13-P01:S09-F67-P01", "S08-F10-P01", 
"S08-F11-P01", "S09-F66-P01", "S09-F67-P01", "0.227", "-0.183", 
"-0.217", "0.292", "Unknown", "Unknown", "Unknown", "Unknown" 
), .Dim = c(4L, 4L), .Dimnames = list(NULL, c("key.related.sheet", 
"sample_B", "trace_2", "type"))) 

cc<-structure(c("S05-F13-P01:S08-F11-P01", "S05-F13-P01:S09-F66-P01", 
"S05-F13-P01:S09-F67-P01", "S05-F13-P01:S09-F68-P01", "S05-F13-P01:S09-F01-P01", 
"S08-F11-P01", "S09-F66-P01", "S09-F67-P01", "S09-F68-P01", "S09-F01-P01", 
"-0.183", "-0.217", "0.292", "-0.314", "0.0418"), .Dim = c(5L, 
3L), .Dimnames = list(NULL, c("key.related.sheet", "sample_B", 
"trace_3"))) 

预期结果将是:

key.related.sheet   sample_B  trace_1 trace_2 trace_3 
"S05-F13-P01:S05-F13-P01" "S05-F13-P01" "1.25" 
"S05-F13-P01:S08-F10-P01" "S08-F10-P01" "0.227"  "0.227" 
"S05-F13-P01:S08-F11-P01" "S08-F11-P01" "-0.183"  "-0.183" "-0.183" 
"S05-F13-P01:S09-F66-P01" "S09-F66-P01" "-0.217"  "-0.217" "-0.217" 
"S05-F13-P01:S09-F67-P01" "S09-F67-P01"    "0.292"  "0.292" 
"S05-F13-P01:S09-F68-P01" "S09-F68-P01"       "-0.314" 
"S05-F13-P01:S09-F01-P01" "S09-F01-P01"       "0.0418" 
+0

@RonakShah请看预期的输出。 – MAPK

回答

1

两个嵌套的合并和去除多余列

merge(merge(aa,bb[, -4], by=c("key.related.sheet", "sample_B") ,all=TRUE), 
     cc, by=c("key.related.sheet", "sample_B") ,all=TRUE) 

     key.related.sheet sample_B trace_1 trace_2 trace_3 
1 S05-F13-P01:S05-F13-P01 S05-F13-P01 1.25 <NA> <NA> 
2 S05-F13-P01:S08-F10-P01 S08-F10-P01 0.227 0.227 <NA> 
3 S05-F13-P01:S08-F11-P01 S08-F11-P01 -0.183 -0.183 -0.183 
4 S05-F13-P01:S09-F66-P01 S09-F66-P01 -0.217 -0.217 -0.217 
5 S05-F13-P01:S09-F67-P01 S09-F67-P01 <NA> 0.292 0.292 
6 S05-F13-P01:S09-F01-P01 S09-F01-P01 <NA> <NA> 0.0418 
7 S05-F13-P01:S09-F68-P01 S09-F68-P01 <NA> <NA> -0.314 
+1

这与@ Patrick的回答有什么不同吗? –

+0

唯一的区别是除去外来色谱柱。在我这样做之前,我得到了错误。直到我发布之后,我也没有看到帕特里克的。 –

+0

使用'Reduce()'而不是嵌套的'merge()'调用会不会更具可伸缩性(和可读性)? – mtoto

1

您可以在矩阵转换成data.frame和dplyr包FULL_JOIN命令将它们连接在一起

library(dplyr) 
for(i in c("aa","bb", "cc")) assign(i, data.frame(get(i))) 
aa %>% full_join(bb, by="key.related.sheet") %>% full_join(cc, 
by="key.related.sheet") 

     key.related.sheet sample_B.x trace_1 sample_B.y trace_2 type sample_B trace_3 
1 S05-F13-P01:S05-F13-P01 S05-F13-P01 1.25  <NA> <NA> <NA>  <NA> <NA> 
2 S05-F13-P01:S08-F10-P01 S08-F10-P01 0.227 S08-F10-P01 0.227 Unknown  <NA> <NA> 
3 S05-F13-P01:S08-F11-P01 S08-F11-P01 -0.183 S08-F11-P01 -0.183 Unknown S08-F11-P01 -0.183 
4 S05-F13-P01:S09-F66-P01 S09-F66-P01 -0.217 S09-F66-P01 -0.217 Unknown S09-F66-P01 -0.217 
5 S05-F13-P01:S09-F67-P01  <NA> <NA> S09-F67-P01 0.292 Unknown S09-F67-P01 0.292 
6 S05-F13-P01:S09-F68-P01  <NA> <NA>  <NA> <NA> <NA> S09-F68-P01 -0.314 
7 S05-F13-P01:S09-F01-P01  <NA> <NA>  <NA> <NA> <NA> S09-F01-P01 0.0418 
2

你也可以做使用基于R的merge方法与all = TRUE完全连接。

> merge(merge(aa,bb,all=TRUE),cc,all=TRUE) 
     key.related.sheet sample_B trace_1 trace_2 type trace_3 
1 S05-F13-P01:S05-F13-P01 S05-F13-P01 1.25 <NA> <NA> <NA> 
2 S05-F13-P01:S08-F10-P01 S08-F10-P01 0.227 0.227 Unknown <NA> 
3 S05-F13-P01:S08-F11-P01 S08-F11-P01 -0.183 -0.183 Unknown -0.183 
4 S05-F13-P01:S09-F66-P01 S09-F66-P01 -0.217 -0.217 Unknown -0.217 
5 S05-F13-P01:S09-F67-P01 S09-F67-P01 <NA> 0.292 Unknown 0.292 
6 S05-F13-P01:S09-F01-P01 S09-F01-P01 <NA> <NA> <NA> 0.0418 
7 S05-F13-P01:S09-F68-P01 S09-F68-P01 <NA> <NA> <NA> -0.314 

这里合并完成w.r.t.所有常见列,即key.related.sheetsample_B - 但这应该在这里确定,因为sample_B取决于key.related.sheet

使用by="key.related.sheet"您将得到与使用dplyr的Adams回答相同的输出。然后合并完成只是w.r.t. key.related.sheetsample_B从左边和右边加入的合作伙伴列在结果出现两种(即,被复制的数据)

5

这是可以做到用的Reducemerge组合如下:

Reduce(function(x, y) merge(x, y, all=TRUE), list(aa, bb[,-4], cc)) 

结果:

 key.related.sheet sample_B trace_1 trace_2 trace_3 
1 S05-F13-P01:S05-F13-P01 S05-F13-P01 1.25 <NA> <NA> 
2 S05-F13-P01:S08-F10-P01 S08-F10-P01 0.227 0.227 <NA> 
3 S05-F13-P01:S08-F11-P01 S08-F11-P01 -0.183 -0.183 -0.183 
4 S05-F13-P01:S09-F66-P01 S09-F66-P01 -0.217 -0.217 -0.217 
5 S05-F13-P01:S09-F67-P01 S09-F67-P01 <NA> 0.292 0.292 
6 S05-F13-P01:S09-F01-P01 S09-F01-P01 <NA> <NA> 0.0418 
7 S05-F13-P01:S09-F68-P01 S09-F68-P01 <NA> <NA> -0.314 

尤其是当您有三个以上的矩阵/数据框时,使用mergeReduce的缩放比较好,然后嵌套合并。