2012-10-29 34 views
0

我有两个文件,我想根据1列加入/合并和2合并/拼接基于条件

输入1

22 42960000 rs149201999 A AC 100 PASS LDAF=0.0649;RSQ=0.8652;AN=2184;ERATE=0.0046;VT=SNP;AA=.;AVGPOST=0.9799;THETA=0.0149;SNPSOURCE=LOWCOV;AC=134;AF=0.06;ASN_AF=0.04;AMR_AF=0.05;AFR_AF=0.10;EUR_AF=0.06 

输入2

22 42960000 . A AC . . ;AA=1;AFE=0.989691;ASNE=1;EUN=0.992509;AFW=1;MED=0.991071;LAT=1 

并且输出将是

22 42960000 . A AC . . ;AA=1;AFE=0.989691;ASNE=1;EUN=0.992509;AFW=1;MED=0.991071;LAT=1;LDAF=0.0649;RSQ=0.8652;AN=2184;ERATE=0.0046;VT=SNP;AA=.;AVGPOST=0.9799;THETA=0.0149;SNPSOURCE=LOWCOV;AC=134;AF=0.06;ASN_AF=0.04;AMR_AF=0.05;AFR_AF=0.10;EUR_AF=0.06 

注意每列由制表符分隔。

+1

(HTTP:// WWW。 whathaveyoutried.com/) –

回答

0

下面是使用GNU awk一个办法:

awk 'FNR==NR { array[$1,$2]=$8; next } ($1,$2) in array { print $0 ";" array[$1,$2] }' input1 input2 

结果:[?你尝试过什么]

22 42960000 . A AC . . ;AA=1;AFE=0.989691;ASNE=1;EUN=0.992509;AFW=1;MED=0.991071;LAT=1;LDAF=0.0649;RSQ=0.8652;AN=2184;ERATE=0.0046;VT=SNP;AA=.;AVGPOST=0.9799;THETA=0.0149;SNPSOURCE=LOWCOV;AC=134;AF=0.06;ASN_AF=0.04;AMR_AF=0.05;AFR_AF=0.10;EUR_AF=0.06 
+0

上面的代码工作正常,同时我不希望** AA =。**从input1文件结果outp ut文件 – AKR

+0

@ user1782877:Try:'awk'FNR == NR {sub(/ AA = \。; /,“”);阵列[$ 1,$ 2] = $ 8;接下来} ...' – Steve

+0

谢谢Steve :) ** gzip -dc input1.vcf.gz input2.vcf.gz | awk'FNR == NR {sub(/ AA = \。; /,“”);阵列[$ 1,$ 2] = $ 8;下一个}($ 1,$ 2)在数组{print $ 0“;”数组[$ 1,$ 2]}'| gzip> output.vcf.gz **我试过这个命令,但它没有生成输出 – AKR

0

这应该工作:

s=%%%%%% 
join -j1 -o1.1,1.2,1.3,1.4,1.5,1.6,1.7,2.7 <(sed "s/\t/$s/" input2) \ 
              <(sed "s/\t/$s/" input1) \ 
| sed "s/$s/\t/; 
     s/\(=[^ ]*\) \([^ ]*=\)/\1;\2/; 
     s/ \+/\t/g"