2017-02-13 57 views
3

在细胞多于一个的字符串我有8列和许多许多行的数据帧的行。我想删除行包含在列6和7以及输出一个数据帧大于一个串仅具有一个在塔6串和7删除包含在数据帧中

DF:

ID Content_ID Chromosome Start Stop Reference Alternate Length 
1299675221 backbone 12 99675221 99675221 GG T 0 
1298583685 backbone 12 98583685 98583685 C T 0 
129833474 backbone 12 9833474  9833474  C T 0 
1297722695 backbone 12 97722695 97722695 A G 0 
1297381269 backbone 12 97381269 97381269 T C 0 
1297081605 backbone 12 97081605 97081605 G AA 0 
1297058068 backbone 12 97058068 97058068 T C 0 
1295891848 backbone 12 95891848 95891848 CCTT ATA 0 
1294164312 backbone 12 94164312 94164312 T C 0 
12940191 backbone 12 940191  940191  T C 0 

期望的输出:

ID Content_ID Chromosome Start Stop Reference Alternate Length 
1298583685 backbone 12 98583685 98583685 C T 0 
129833474 backbone 12 9833474  9833474  C T 0 
1297722695 backbone 12 97722695 97722695 A G 0 
1297381269 backbone 12 97381269 97381269 T C 0 
1297058068 backbone 12 97058068 97058068 T C 0 
1294164312 backbone 12 94164312 94164312 T C 0 
12940191 backbone 12 940191  940191  T C 0 

回答

3

我们可以通过图6和7使用lapply列回路中,检查的字符数是否是1,则使用与Reduce&通过比较的相应的元件以获得一个逻辑,用它来子集“DF”

df[Reduce(`&`, lapply(df[6:7], function(x) nchar(x)==1)),] 
#  ID Content_ID Chromosome Start  Stop Reference Alternate Length 
#2 1298583685 backbone   12 98583685 98583685   C   T  0 
#3 129833474 backbone   12 9833474 9833474   C   T  0 
#4 1297722695 backbone   12 97722695 97722695   A   G  0 
#5 1297381269 backbone   12 97381269 97381269   T   C  0 
#7 1297058068 backbone   12 97058068 97058068   T   C  0 
#9 1294164312 backbone   12 94164312 94164312   T   C  0 
#10 12940191 backbone   12 940191 940191   T   C  0 

或其他选项的行为rowSums

df[!rowSums(nchar(as.matrix(df[6:7]))!=1),] 
2

同样,你可以列粘贴在一起,然后继续行,其中的数字符等于3,每列和一个空格。

df[nchar(paste(df$Reference, df$Alternate)) == 3,] 
      ID Content_ID Chromosome Start  Stop Reference Alternate Length 
2 1298583685 backbone   12 98583685 98583685   C   T  0 
3 129833474 backbone   12 9833474 9833474   C   T  0 
4 1297722695 backbone   12 97722695 97722695   A   G  0 
5 1297381269 backbone   12 97381269 97381269   T   C  0 
7 1297058068 backbone   12 97058068 97058068   T   C  0 
9 1294164312 backbone   12 94164312 94164312   T   C  0 
10 12940191 backbone   12 940191 940191   T   C  0 
1

简单,因为这使用data.table

library(data.table) 

setDT(df) 
df <- df[ nchar(Reference)==1 & nchar(Alternate)==1]