2016-07-05 86 views
1

我有一个像下面如何使基于字符串

Column1   Column2   Column3 
Q9Y6Y8    P28074   Q9Y6A4 
Q9Y6W5    P28066   Q9Y623 
Q9Y6H1    P27695   Q9Y5W9 
Q5T1J5    P25786;Q9Y623 
Q9Y6A4 
Q9Y623;P27695;Q9Y623 
Q9Y5W9 
Q9Y6Y8 

所以我想用多列数据帧的组合,首先把它们放在一起,并得到他们的独特像下面

Q9Y6Y8       
Q9Y6W5      
Q9Y6H1      
Q5T1J5    
Q9Y6A4 
Q9Y623 
P27695 
Q9Y623 
Q9Y5W9 
Q9Y6Y8 
P25786 
P28074 
P28066 

然后我想要所有字符串的组合,如下所示:

Q9Y6Y8 Q9Y6W5 
Q9Y6Y8 Q9Y6H1      
Q9Y6Y8 Q9Y6A4       
Q9Y6Y8 Q5T1J5    
Q9Y6Y8 Q9Y6A4 
Q9Y6Y8 Q9Y623 
Q9Y6Y8 P27695 
Q9Y6Y8 Q9Y623 
    . 
    . 
    . 
Q9Y6W5 Q9Y6H1 
Q9Y6W5 Q9Y6A4 
Q9Y6W5 Q5T1J5 
    . 
    . 
    . 

直到所有字符串都在巴黎onc Ë

回答

3

我们可以通过unlist荷兰国际集团的data.frame做到这一点(如data.frame是list)到vector,通过;拆分,然后unlistlist输出(从strsplit),并得到了unique元素作为vector

Un1 <- unique(unlist(strsplit(unlist(df1), ";"))) 

从这一点,我们可以使用expand.grid

expand.grid(Un1, Un1) 

得到所有的组合或者,如果我们只需要有限的组合,可以使用combn

t(combn(Un1, 2)) 
#  [,1]  [,2]  
# [1,] "Q9Y6Y8" "Q9Y6W5" 
# [2,] "Q9Y6Y8" "Q9Y6H1" 
# [3,] "Q9Y6Y8" "Q5T1J5" 
# [4,] "Q9Y6Y8" "Q9Y6A4" 
# [5,] "Q9Y6Y8" "Q9Y623" 
# [6,] "Q9Y6Y8" "P27695" 
# [7,] "Q9Y6Y8" "Q9Y5W9" 
# [8,] "Q9Y6Y8" "P28074" 
# [9,] "Q9Y6Y8" "P28066" 
#[10,] "Q9Y6Y8" "P25786" 
#[11,] "Q9Y6W5" "Q9Y6H1" 
#[12,] "Q9Y6W5" "Q5T1J5" 
#[13,] "Q9Y6W5" "Q9Y6A4" 
#[14,] "Q9Y6W5" "Q9Y623" 
#[15,] "Q9Y6W5" "P27695" 
#[16,] "Q9Y6W5" "Q9Y5W9" 
#[17,] "Q9Y6W5" "P28074" 
#[18,] "Q9Y6W5" "P28066" 
#[19,] "Q9Y6W5" "P25786" 
#[20,] "Q9Y6H1" "Q5T1J5" 
#[21,] "Q9Y6H1" "Q9Y6A4" 
#[22,] "Q9Y6H1" "Q9Y623" 
#[23,] "Q9Y6H1" "P27695" 
#[24,] "Q9Y6H1" "Q9Y5W9" 
#[25,] "Q9Y6H1" "P28074" 
#[26,] "Q9Y6H1" "P28066" 
#[27,] "Q9Y6H1" "P25786" 
#[28,] "Q5T1J5" "Q9Y6A4" 
#[29,] "Q5T1J5" "Q9Y623" 
#[30,] "Q5T1J5" "P27695" 
#[31,] "Q5T1J5" "Q9Y5W9" 
#[32,] "Q5T1J5" "P28074" 
#[33,] "Q5T1J5" "P28066" 
#[34,] "Q5T1J5" "P25786" 
#[35,] "Q9Y6A4" "Q9Y623" 
#[36,] "Q9Y6A4" "P27695" 
#[37,] "Q9Y6A4" "Q9Y5W9" 
#[38,] "Q9Y6A4" "P28074" 
#[39,] "Q9Y6A4" "P28066" 
#[40,] "Q9Y6A4" "P25786" 
#[41,] "Q9Y623" "P27695" 
#[42,] "Q9Y623" "Q9Y5W9" 
#[43,] "Q9Y623" "P28074" 
#[44,] "Q9Y623" "P28066" 
#[45,] "Q9Y623" "P25786" 
#[46,] "P27695" "Q9Y5W9" 
#[47,] "P27695" "P28074" 
#[48,] "P27695" "P28066" 
#[49,] "P27695" "P25786" 
#[50,] "Q9Y5W9" "P28074" 
#[51,] "Q9Y5W9" "P28066" 
#[52,] "Q9Y5W9" "P25786" 
#[53,] "P28074" "P28066" 
#[54,] "P28074" "P25786" 
#[55,] "P28066" "P25786" 

注意:在这里,我假设列都是character类。

+0

@nik您的专栏是“因素”。所以'strsplit(as.character(unlist(df1)),“,”)' – akrun

+1

我喜欢你的答案,但我必须等待2分钟,然后接受它 – nik

+0

你可以请添加一些描述吗?你为什么要两次使用unlist? – nik