2015-11-02 68 views
0

我有一个很大的数据框,在这两个数据框中,当两行对于个人对来说都是相等的时候,我必须找到列。R中的成对分析

这里是数据帧的一个示例:

>data 
    ID pos1234 pos1345 pos1456 pos1678 
1 1  C  A  C  G 
2 2  C  G  A  G 
3 3  C  A  G  A 
4 4  C  G  C  T 

我变换数据帧与成对矩阵:

apply(data, 2, combn, m=2) 


     ID pos1234 pos1345 pos1456 pos1678 
[1,] "1" "C"  "A"  "C"  "G" 
[2,] "2" "C"  "G"  "A"  "G" 
[3,] "1" "C"  "A"  "C"  "G" 
[4,] "3" "C"  "A"  "G"  "A" 
[5,] "1" "C"  "A"  "C"  "G" 
[6,] "4" "C"  "G"  "C"  "T" 
[7,] "2" "C"  "G"  "A"  "G" 
[8,] "3" "C"  "A"  "G"  "A" 
[9,] "2" "C"  "G"  "A"  "G" 
[10,] "4" "C"  "G"  "C"  "T" 
[11,] "3" "C"  "A"  "G"  "A" 
[12,] "4" "C"  "G"  "C"  "T" 

我现在有麻烦识别包含对之间的相同字母列。例如,对于对12,包含相同字母的列将是pos1234pos1678

是否有可能为每一对个人使用相同的字母获得数据框?

在此先感谢。

回答

1

可以传递一个函数来combn

res0 <- combn(nrow(data), 2, FUN = function(x) 
    names(data[-1])[ lengths(sapply(data[x,-1], unique)) == 1 ], simplify=FALSE) 

其给出

[[1]] 
[1] "pos1234" "pos1678" 

[[2]] 
[1] "pos1234" "pos1345" 

[[3]] 
[1] "pos1234" "pos1456" 

[[4]] 
[1] "pos1234" 

[[5]] 
[1] "pos1234" "pos1345" 

[[6]] 
[1] "pos1234" 

为了找出其中这些[[1]]。[[6]]对应于对,再取combn

res <- setNames(res0, combn(data$ID, 2, paste, collapse=".")) 

这给

$`1.2` 
[1] "pos1234" "pos1678" 

$`1.3` 
[1] "pos1234" "pos1345" 

$`1.4` 
[1] "pos1234" "pos1456" 

$`2.3` 
[1] "pos1234" 

$`2.4` 
[1] "pos1234" "pos1345" 

$`3.4` 
[1] "pos1234"