2011-08-24 71 views
-4

我有这样一个清单:要比较两个列表,找到相似之处

C 
E 

我想找到这些在下表(表1),并写入到第二个表(表2)

有没有人有python或perl脚本来做到这一点?

表1:

A MU_ADO_2 1099 MU_ADO_2.1099 o o o o o o o o o o 7.82436 s_3_merged Suseptible A AG 2 4 0 2 0                    
A MU_ADO_2 1105 MU_ADO_2.1105 327.008 s_2_merged Resistance G GT 81 0 2 132 79 31.5281 s_6_merged Resistance G GT 8 0 1 8 7 34.9813 s_3_merged Suseptible G GT 7 0 0 3 7 7.82436 s_7_merged Suseptible G GT 2 0 0 4 2 
A MU_ADO_2 1110 MU_ADO_2.1110 515.963 s_2_merged Resistance A AT 113 96 1 2 110 31.5281 s_6_merged Resistance A AT 7 8 0 0 7 16.3388 s_3_merged Suseptible A AT 4 7 0 0 4 13.808 s_7_merged Suseptible A AT 3 3 0 0 3 
A MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
B MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
B MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
B MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 

表2:

C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0 
+1

你尝试过什么至今? “C E”是什么意思?你想找什么? – daveydave400

+0

现在你的表格被编辑了(谢谢F.J)我唯一的问题是你到目前为止尝试了什么? – daveydave400

回答

1

替代,在python:

keys = ['C', 'E'] 
with open('out.txt', 'a') as out: 
    with open('test.txt') as f: 
     for line in f: 
      for key in keys: 
       if line.startswith(key): 
        out.write(line) 
        break 

test.txt是一个包含您的表1的文件,复制粘贴。
out.txt是你得到你的表的文件2

+0

你需要在你的循环中的'write'之后有一个'break',以便使它更有效率,或者相当于Python 2.7中的两行 - “open('out.txt','a')out,open 'test.txt')as f:'then'out.writelines(line for line in f if(line.startswith(key)for keys in keys))' – agf

+0

@agf,我包含了一个break。对于其他人来说,我更愿意让代码尽可能简单,而这些代码似乎是SO中的新手。 – joaquin

+0

是的,我不是真的推荐高尔夫版本,如果是好的话,打破循环+。 – agf

1

如果你的问题是:“如何可以的,如果过滤该文件只看到第一场等于CE条目? “

那么下面应该工作:

awk '$1 ~ /[CE]/ { print $0 }' yourfile > outfile 

如果你想在清晰度为代价节省一些按键,以下也适用:

awk '$1 ~ /[CE]/' yourfile > outfile 
+0

它需要Perl中的所有三个字符。所以呢?您还拥有无限更好的正则表达式 - 以及真正的™编程语言。还要注意,你的代码并没有做你说的那样。哎呦! – tchrist

+0

@tchrist放松,Perl比awk好,我不是想要开始一场圣战,我会删除让你不高兴的评论。但是,据我所知,这是有效的,让我知道你发现了什么错误。 –

+0

评论只是挑衅的一切。但是,您的代码会检测第一个字段是否包含C或E,这与说'$ 1 =='C“||有很大区别。 $ 1 ==“E”',这就是你的“第一个字段等于'C'或'E'”所说的。我并没有对正确性做出判断,只是指出代码描述与代码所做的不一致。一个Perl解决方案是'perl -ne'/^[CE]/&& print'',尽管我更喜欢'print if/^ [CE] /'更可读。 – tchrist

3

由于包含的是标签我”假设您对其他* nix实用程序开放,这里是一个sed解决方案:

sed '/^[^CE]/d' table1.txt > table2.txt 

这将删除从table1.txt所有行不使用C或E

0

假设 “CE” 名单开始来自于一个文件:

awk ' 
    FILENAME == ARGV[1] {list[$1]; next} 
    $1 in list {print} 
' list.txt table1 > table2 
3

如何的grep

grep -e '^[CE]' source.file 

,你可以重定向到一个新的文件,以及:

grep -e '^[CE]' source.file > dest.file 
+0

干净简单! – flies

+0

不错!从'awk'到'sed'到'grep'的过程不断导致更简单的答案。 –