2012-02-15 122 views
1

我对Perl很新,希望有人能帮我解决这个问题。我需要从CSV文件嵌入逗号中提取两列。这是该格式的样子:如何使用Perl从CSV文件中提取多列

"ID","URL","DATE","XXID","DATE-LONGFORMAT" 

我需要提取DATE立柱,XXID列,XXID后列。请注意,每一行不一定遵循相同的列数。

XXID列包含2个字母的前缀,并不总是以相同的字母开头。它几乎可以是aplhabet的任何信件。长度总是相同的。

最后,一旦这三列被提取,我需要对XXID列进行排序并获得重复计数。

回答

0

你一定要使用CPAN库来解析CSV,因为你永远不会考虑到格式的所有怪癖。

请参阅:How can I parse quoted CSV in Perl with a regex?

请参阅:How do I efficiently parse a CSV file in Perl?

然而,这里是您所提供的特定字符串非常幼稚和非惯用的解决方案:

use strict; 
use warnings; 

my $string = '"ID","URL","DATE","XXID","DATE-LONGFORMAT"'; 

my @words =(); 
my $word = ""; 
my $quotec = '"'; 
my $quoted = 0; 

foreach my $c (split //, $string) 
{ 
    if ($quoted) 
    { 
    if ($c eq $quotec) 
    { 
     $quoted = 0; 
     push @words, $word; 
     $word = ""; 
    } 
    else 
    { 
     $word .= $c; 
    } 
    } 
    elsif ($c eq $quotec) 
    { 
    $quoted = 1; 
    } 
} 

for (my $i = 0; $i < scalar @words; ++$i) 
{ 
    print "column " . ($i + 1) . " = $words[$i]\n"; 
} 
3

下面是一个示例脚本使用Text::CSV模块来解析您的csv数据。查阅模块的文档以找到适合您数据的设置。

#!/usr/bin/perl 
use strict; 
use warnings; 
use Text::CSV; 

my $csv = Text::CSV->new({ binary => 1 }); 

while (my $row = $csv->getline(*DATA)) { 
    print "Date: $row->[2]\n"; 
    print "Col#1: $row->[3]\n"; 
    print "Col#2: $row->[4]\n"; 
} 
3

我出版了一本名为Tie::Array::CSV模块,它可以让Perl,以您的CSV互动作为本地的Perl嵌套数组。如果你使用这个,你可以使用你的搜索逻辑并应用它,就好像你的数据已经在一个数组引用数组中一样。看一看!

#!/usr/bin/env perl 

use strict; 
use warnings; 

use File::Temp; 
use Tie::Array::CSV; 
use List::MoreUtils qw/first_index/; 
use Data::Dumper; 

# this builds a temporary file from DATA 
# normally you would just make $file the filename 
my $file = File::Temp->new; 
print $file <DATA>; 
######### 

tie my @csv, 'Tie::Array::CSV', $file; 

#find column from data in first row 
my $colnum = first_index { /^\w.{6}$/ } @{$csv[0]}; 
print "Using column: $colnum\n"; 

#extract that column 
my @column = map { $csv[$_][$colnum] } (0..$#csv); 

#build a hash of repetitions 
my %reps; 
$reps{$_}++ for @column; 

print Dumper \%reps; 
相关问题