2015-12-15 91 views
5

我有一个数据帧,名为dd2。我需要粘贴Left.Gene.SymbolsRight.Gene.Symbols中的值,我可以通过简单地使用下面的代码来完成这些操作,但如果缺少值,我不希望粘贴NDA。我希望它看起来像在combination列中,如result所示。忽略NA值,同时在R中粘贴两个列值

mycode的

#to remove NAs 
dd2[dd2 == 'NA'] <- NA 
#pasting values together 
result <- cbind(dd2,combination = paste(dd2[,"Left.Gene.Symbols"],dd2[,"Right.Gene.Symbols"],sep="*")) 

数据

dd2<- structure(c("AMLM12001KP", "AMLM12001KP", "AMLM12001KP", "AMLM12001KP", 
"AMLM12001KP", "AK2", "HFM1", "HFM1", "HFM1", "HFM1", NA, "PPT", 
NA, "GGT", NA), .Dim = c(5L, 3L), .Dimnames = list(NULL, c("customer_sample_id", 
"Left.Gene.Symbols", "Right.Gene.Symbols"))) 

结果

customer_sample_id Left.Gene.Symbols Right.Gene.Symbols combination 
[1,] "AMLM12001KP"  "AK2"    NA     AK2* 
[2,] "AMLM12001KP"  "HFM1"   "PPT"     HFM1*PPT 
[3,] "AMLM12001KP"  "HFM1"   NA     HFM1* 
[4,] "AMLM12001KP"  "HFM1"   "GGT"     HFM1*GGT 
[5,] "AMLM12001KP"  "HFM1"   NA     HFM1* 
+1

@RonakShah对不起只是纠正了。 – MAPK

回答

3

你可以这样做,用空字符""暂时替换NA的值。

cbind(
    dd2, 
    combination = paste(dd2[,2], replace(dd2[,3], is.na(dd2[,3]), ""), sep = "*") 
) 
#  customer_sample_id Left.Gene.Symbols Right.Gene.Symbols combinations 
# [1,] "AMLM12001KP"  "AK2"    NA     "AK2*"  
# [2,] "AMLM12001KP"  "HFM1"   "PPT"    "HFM1*PPT" 
# [3,] "AMLM12001KP"  "HFM1"   NA     "HFM1*"  
# [4,] "AMLM12001KP"  "HFM1"   "GGT"    "HFM1*GGT" 
# [5,] "AMLM12001KP"  "HFM1"   NA     "HFM1*"  

当然,将您的列名替换为上面的列号。我没有写他们,因为他们太长了。

+0

非常感谢。所以如果我在第2列也有NAs,我可以直接使用替换(dd2 [,2],is.na(dd2 [,2])? – MAPK

+1

@MAPK - 是的,但是这个调用是'replace(dd2 [,2],is.na(dd2 [,2])“”)'如果你愿意的话,你可以为整个矩阵做。 –

2

一种使用方式ifelse

ifelse(is.na(dd2[,3]),paste0(dd2[,2],"*"),paste(dd2[,2],dd2[,3],sep="*")) 

#[1] "AK2*"  "HFM1*PPT" "HFM1*" "HFM1*GGT" "HFM1*" 
+0

不能在这里使用sub,因为一些基因名称的名字中包含NA。如在MPNA,TTNA中,它将删除NA部分? – MAPK

+1

@MAPK更新了答案 –

2

我们可以使用NAerqdapsprintf

library(qdap) 
sprintf('%s*%s', dd2[,2],NAer(dd2[,3],'')) 
#[1] "AK2*"  "HFM1*PPT" "HFM1*" "HFM1*GGT" "HFM1*"