2014-10-01 40 views
0

所以我有一个2列的文件。我想在文件的第2列找到所有独特的模式,模式出现的次数以及文件第1列中的相应伙伴。我的Perl脚本有什么问题?尝试使用哈希计数

所以下面是我的文件,列1和2的样品通过一个标签被分离:

OG5_126538 01111111111110 
OG5_126539 01110111110100 
OG5_126552 10000000000000 
OG5_126558 11111111111111 
OG5_126561 11111010000111 
OG5_126566 01111011101001 
OG5_126569 11111111111110 
OG5_126570 11111111111110 
OG5_126572 11111111111110 

模式中第2列“11111111111110”的发生3倍和它的相关联的第1列中的合作伙伴是“OG5_126572 ,OG5_126570,OG5_126569“。我想要第2栏中所有独特模式的信息。

我写了一个perl程序,它被粘贴在下面。但我不断收到错误。我是编程新手。我的程序有什么问题?感谢您提前提供的所有帮助。 Perl程序:

#!/usr/local/bin/perl 
use strict; 
use warnings; 

if (@ARGV < 1) { 
    print "usage: matrix.pl filename\n"; 
    die; 
} 

my $my_file = shift; 

my (%matrix_pattern); 

open(SOURCE, $my_file); 

while (<SOURCE>) { 
    chomp; 
    my ($group, $pattern) = split("\t", $_); 
    $matrix_pattern{$group} = $pattern; 
    $matrix_pattern{$pattern}++; 
} 

my @unique = values(%matrix_pattern); 
my @sorted_unique = sort @unique; 
foreach my $unique (@sorted_unique) { 
    my $test = $matrix_pattern{$unique}; 
    print "$unique $test\n"; 
} 

close SOURCE; 

下面是该程序的输出:

01110111110100 1 
01111011101001 1 
01111111111110 1 
Use of uninitialized value $test in concatenation (.) or string at matrix_sample.pl line 27, <SOURCE> line 9. 
1 
Use of uninitialized value $test in concatenation (.) or string at matrix_sample.pl line 27, <SOURCE> line 9. 
1 
Use of uninitialized value $test in concatenation (.) or string at matrix_sample.pl line 27, <SOURCE> line 9. 
1 
Use of uninitialized value $test in concatenation (.) or string at matrix_sample.pl line 27, <SOURCE> line 9. 
1 
Use of uninitialized value $test in concatenation (.) or string at matrix_sample.pl line 27, <SOURCE> line 9. 
1 
Use of uninitialized value $test in concatenation (.) or string at matrix_sample.pl line 27, <SOURCE> line 9. 
1 
10000000000000 1 
11111010000111 1 
11111111111110 3 
11111111111110 3 
11111111111110 3 
11111111111111 1 
Use of uninitialized value $test in concatenation (.) or string at matrix_sample.pl line 27, <SOURCE> line 9. 
3 

回答

1

你试图使用哈希作为密钥的values。这是警告的来源。

一个简单的办法,解决了你的目标是使用数组的哈希:

#!/usr/local/bin/perl 
use strict; 
use warnings; 

my $fh = \*DATA; 

my %matrix; 

while (<$fh>) { 
    chomp; 
    my ($group, $pattern) = split ' '; 
    push @{$matrix{$pattern}}, $group; 
} 

for my $pattern (sort keys %matrix) { 
    print $pattern . ' for ' . @{$matrix{$pattern}} . " times. Values are @{$matrix{$pattern}}\n"; 
} 

__DATA__ 
OG5_126538 01111111111110 
OG5_126539 01110111110100 
OG5_126552 10000000000000 
OG5_126558 11111111111111 
OG5_126561 11111010000111 
OG5_126566 01111011101001 
OG5_126569 11111111111110 
OG5_126570 11111111111110 
OG5_126572 11111111111110 

输出:

01110111110100 for 1 times. Values are OG5_126539 
01111011101001 for 1 times. Values are OG5_126566 
01111111111110 for 1 times. Values are OG5_126538 
10000000000000 for 1 times. Values are OG5_126552 
11111010000111 for 1 times. Values are OG5_126561 
11111111111110 for 3 times. Values are OG5_126569 OG5_126570 OG5_126572 
11111111111111 for 1 times. Values are OG5_126558 
+0

谢谢你这么多@Miller。所以用这行:“print $ pattern。”为'。@ {$ matrix {$ pattern}}。“次。值为@ {$ matrix {$ pattern}} \ n“;”**。 @ {$ matrix {$ pattern}}。**计数和** @ {$ matrix {$ pattern}} **告诉我什么被计数? – Ousman 2014-10-01 16:01:24

+0

如果您在['scalar'](http://perldoc.perl.org/functions/scalar.html)上下文中使用数组,它将返回元素的数量。这就是连接运算符'.'所做的。但是,如果插入一个数组,它会使用缺省为空格的['$“'](http://perldoc.perl.org/perlvar.html#%24LIST_SEPARATOR)的值连接元素。 – Miller 2014-10-03 23:30:15