我是新的perl只是尝试与小杂乱的代码。我怎样才能合并和处理多个行从一个文件使用Perl产生一个报告
猫input1.txt
##gff-version 2
##source-version geneious 5.6.4
Xm_ABL1 Geneious CDS 1 168 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 169 334 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 335 628 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 629 901 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 902 985 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 986 1165 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 1166 1350 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious CDS 1351 1504 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
Xm_ABL1 Geneious BLAST Hit 169 334 . + .
Xm_ABL1 Geneious extracted region 1 168 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="351297 -> 351464"
Xm_ABL1 Geneious extracted region 169 334 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="371785 -> 371950"
Xm_ABL1 Geneious extracted region 335 628 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="372554 -> 372847"
Xm_ABL1 Geneious extracted region 629 901 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="374760 -> 375032"
Xm_ABL1 Geneious extracted region 902 985 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="375230 -> 375313"
Xm_ABL1 Geneious extracted region 986 1165 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="375992 -> 376171"
Xm_ABL1 Geneious extracted region 1166 1350 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="376575 -> 376759"
Xm_ABL1 Geneious extracted region 1351 1504 . + . Name=Extracted region from gi|371443098|gb|JH556762.1|;Extracted interval="376914 -> 377067"
如果输入文件方含( - >)向前arrow.I想像 输出,如果($阵列[7] =〜/.*间隔= \“\ d + - > \ d + \“$/gm){$ array [5] =”+“; }
猫output1.txt
gi_371443098_gb_JH556762.1 gene 351297 377067 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 351297 351464 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 371785 371950 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 372554 372847 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 374760 375032 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 375230 375313 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 375992 376171 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 376575 376759 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 376914 377067 . + . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
###
猫output1.txt 如果输入文件方含(< - )反向箭头。 if($ array [7] =〜/.* interval = \“\ d + < - \ d + \”$/gm){$ array [5] =“ - ”; }
gi_371443098_gb_JH556762.1 gene 351297 377067 . - . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 351297 351464 . - . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 371785 371950 . - . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 372554 372847 . - . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 374760 375032 . - . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 375230 375313 . - . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 375992 376171 . - . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 376575 376759 . - . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
gi_371443098_gb_JH556762.1 CDS 376914 377067 . - . Name=Xm_ABL1;created by=User;modified by=User;ID=w0IVHutPuN4H4FVDCg4sFVRaJjQ.1340919460469.4
###
我已经尝试了一些小杂乱的代码,因为我是初学者。
#usr/bin/perl;
use strict;
open(FH,"$ARGV[0]");
while(<FH>){
chomp $_;
my @array=split("\t");
my $key="$array[2]-$array[0]-$array[1]-$array[2]-$array[3]";
if($array[1] eq "CDS"){
$cds_cnt{$key}++;
$cds{$key}="$array[4]\t$array[5]\t$array[6]\t$array[7]";
}
if($array[1] eq "extracted region"){
(my $pos1,my $pos2)=($array[7]=~/.*interval=\"(\d+) -> (\d+)\"$/gm);
$extract_cnt{$key}++;
$extract{$key}="$pos1\t$pos2";
}
}
foreach $i ( sort {$a<=>$b} keys %cds){
my $a=$i; #print "$i\n";
$a=~s/CDS/extracted region/g;
if($cds_cnt{$i} == $extract_cnt{$a}){
#print "$i\t$cds{$i}\n$a\t$extract{$a}\n";
my @array=split /\-/,$i;
my @pos=split "\t",$extract{$a};
print "$array[1]\t$array[2]\t$pos[0]\t$pos[1]\t$cds{$i}\n";
}
}
print "###";
更新
我需要在我的代码修改什么
1.To从提取的区域(即阵列[7] =/GI的行获得价值| 371443098 | GB | JH556762.1 | /)它可以是任何值,为其添加下划线(即gi_371443098_gb_JH556762.1)并在output1.txt中的数组[0]中打印,如图所示。
2.添加新行作为第一行打印时(gi_371443098_gb_JH556762.1基因),第3列中得到起始CDS(即351297)的值,并获得在第4栏(即377067)结束CDS的值,并打印在第一行如ouput1.txt所示
3.如果/提取的区域/块的所有行为.egExtracted interval =“351297 - > 351464”(即向前箭头)打印数组[5]为“+”符号包括输出中的基因头。如果例如提取间隔=“351297 <-351464”(反向箭头)将阵列[5]打印为包括输出中基因标题的“ - ”符号。
随时用pragma严格! – 2012-07-11 22:33:39