2017-04-13 24 views
0

我有相应疾病OMIM基因列表(约15000个基因),看起来像这样:安排数据行中的R

SLC6A8,CRTR,CCDS1 Cerebral creatine deficiency syndrome 1, 300352 (3) 
BCAP31,BAP31,DXS1357E,DDCH Deafness, dystonia, and cerebral hypomyelination 
ABCD1,ALD,AMN Adrenoleukodystrophy, 300100 (3), X-linked recessive 
PLXNB3,PLXN6 NA 

对于某些疾病,我们与疾病相关的多个基因名。我想这个组织,所以我必须每行只有一个genename和相关疾病:

SLC6A8 Cerebral creatine deficiency syndrome 1, 300352 (3) 
CRTR Cerebral creatine deficiency syndrome 1, 300352 (3) 
CCDS1 Cerebral creatine deficiency syndrome 1, 300352 (3) 

难道这在R上做了什么?

+1

*“?难道这在R上做” *的data.frame - 最有可能的,但你到目前为止尝试过什么?这不是一个代码写入服务。 – nrussell

+0

我一直在使用R的几件事情,但在这种情况下,我不知道我到底需要看什么。我只想提示,不一定是代码! – VasoGene

回答

1

不完全确定你有什么样的数据结构。这里有一个快速的解决方案,希望对您有所帮助在找什么:

splitFn <- function(x) expand.grid(df[x,"a"] %>% as.character %>% strsplit(., ",") %>% unlist, df[x, "b"]) 
ldply(1:nrow(df), splitFn) 

     Var1            Var2 
1 SLC6A8 Cerebral creatine deficiency syndrome 1, 300352(3) 
2  CRTR Cerebral creatine deficiency syndrome 1, 300352(3) 
3  CCDS1 Cerebral creatine deficiency syndrome 1, 300352(3) 
4 BCAP31 Deafness, dystonia, and cerebral hypomyelination 
5  BAP31 Deafness, dystonia, and cerebral hypomyelination 
6 DXS1357E Deafness, dystonia, and cerebral hypomyelination 
7  DDCH Deafness, dystonia, and cerebral hypomyelination 
8  ABCD1 Adrenoleukodystrophy, 300100(3), X-linked recessive 
9  ALD Adrenoleukodystrophy, 300100(3), X-linked recessive 
10  AMN Adrenoleukodystrophy, 300100(3), X-linked recessive 
11 PLXNB3            <NA> 
12 PLXN6            <NA> 

我会用

df <- structure(list(a = structure(c(4L, 2L, 1L, 3L), .Label = c("ABCD1,ALD,AMN", 
"BCAP31,BAP31,DXS1357E,DDCH", "PLXNB3,PLXN6", "SLC6A8,CRTR,CCDS1" 
), class = "factor"), b = structure(c(1L, 3L, 2L, NA), .Label = c(" Cerebral 
creatine deficiency syndrome 1, 300352(3)", 
"Adrenoleukodystrophy, 300100(3), X-linked recessive", "Deafness, dystonia, and cerebral hypomyelination" 
), class = "factor")), .Names = c("a", "b"), row.names = c(NA, 
-4L), class = "data.frame")```