我写了一个PERL程序,该程序需要一个Excel工作表(通过将.xls扩展名改为.txt来转换为文本文件)以及一个用于输入的序列文件。 Excel工作表包含序列文件中某个区域的起始点和结束点(以及匹配区域任一侧的70个侧翼值),这些文件需要剪切并提取到第三个输出文件中。有300个值。程序读入每次需要切割的序列的起始点和结束点,但它重复告诉我,如果显然不是输入文件的长度,则该值超出了该长度。我只是不能似乎得到这个固定Perl程序错误
这是程序
use strict;
use warnings;
my $blast;
my $i;
my $idline;
my $sequence;
print "Enter Your BLAST result file name:\t";
chomp($blast = <STDIN>); # BLAST result file name
print "\n";
my $database;
print "Enter Your Gene list file name:\t";
chomp($database = <STDIN>); # sequence file
print "\n";
open IN, "$blast" or die "Can not open file $blast: $!";
my @ids =();
my @seq_start =();
my @seq_end =();
while (<IN>) {
#spliting the result file based on each tab
my @feilds = split("\t", $_);
push(@ids, $feilds[0]); #copying the name of sequence
#coping the 6th tab value of the result which is the start point of from where a value should be cut.
push(@seq_start, $feilds[6]);
#coping the 7th tab value of the result file which is the end point of a value should be cut.
push(@seq_end, $feilds[7]);
}
close IN;
open OUT, ">Result.fasta" or die "Can not open file $database: $!";
for ($i = 0; $i <= $#ids; $i++) {
($sequence) = &block($ids[$i]);
($idline, $sequence) = split("\n", $sequence);
#extracting the sequence from the start point to the end point
my $seqlen = $seq_end[$i] - $seq_start[$i] - 1;
my $Nucleotides = substr($sequence, $seq_start[$i], $seqlen); #storing the extracted substring into $sequence
$Nucleotides =~ s/(.{1,60})/$1\n/gs;
print OUT "$idline\n";
print OUT "$Nucleotides\n";
}
print "\nExtraction Completed...";
sub block {
#block for id storage which is the first tab in the Blast output file.
my $id1 = shift;
print "$id1\n";
my $start =();
open IN3, "$database" or die "Can not open file $database: $!";
my $blockseq = "";
while (<IN3>) {
if (($_ =~ /^>/) && ($start)) {
last;
}
if (($_ !~ /^>/) && ($start)) {
chomp;
$blockseq .= $_;
}
if (/^>$id1/) {
my $start = $. - 1;
my $blockseq .= $_;
}
}
close IN3;
return ($blockseq);
}
BLAST结果文件:http://www.fileswap.com/dl/Ws7ehftejp/
序列文件:http://www.fileswap.com/dl/lPwuGh2oKM/
错误
SUBSTR之外字符串在Nucleotide_Extractor.pl第39行。
0在 Nucleotide_Extractor.pl线在Nucleotide_Extractor.pl线44 41.使用未初始化值$核苷酸的级联(。)或串
使用未初始化值$核苷酸的置换(一个或多个///)。
任何帮助是非常赞赏和查询总是被邀请
什么是phytophthora文件?没有它,我无法处理块功能。你看起来像substr(“Hello”,45,4)那样带有字符串长度以外的起始索引的子字符串。由于它不返回$核苷酸也未初始化。我建议你检查substr的索引。 – xtreak 2014-09-19 05:42:51
@Wordzilla这是我在问题中提供的链接所使用的序列文件名。我已经将两个输入文件上传到fileswap并提供了链接。请下载这两个文件并进行处理。该序列属于名为Phytophthora的生物体。我现在改了文件名。谢谢 – 2014-09-19 05:56:01
您还应该在脚本中使用strict,并使用'my'声明所有变量 - 即'my $ sequence = ...'。 – 2014-09-19 07:04:22