如何解析perl中的多行固定宽度文件？

我有一个文件，我需要解析以下格式。（所有的分隔符是空格）：如何解析perl中的多行固定宽度文件？

field name 1:   Multiple word value. 
field name 2:   Multiple word value along 
         with multiple lines. 
field name 3:   Another multiple word 
         and multiple line value.

我熟悉如何解析一行固定宽度的文件，但我有如何处理多行难住了。

来源

2011-12-14 NeonD

#!/usr/bin/env perl 

use strict; use warnings; 

my (%fields, $current_field); 

while (my $line = <DATA>) { 
    next unless $line =~ /\S/; 

    if ($line =~ /^ \s+ (\S .+)/x) { 
     if (defined $current_field) { 
      $fields{ $current_field} .= $1; 
     } 
    } 
    elsif ($line =~ /^(.+?) : \s+ (.+) \s+/x) { 
     $current_field = $1; 
     $fields{ $current_field } = $2; 
    } 
} 

use Data::Dumper; 
print Dumper \%fields; 

__DATA__ 
field name 1:   Multiple word value. 
field name 2:   Multiple word value along 
         with multiple lines. 
field name 3:   Another multiple word 
         and multiple line value.

来源

2011-12-14 19:46:10

谢谢！我将`。+`的第一个实例更改为`。+？`，以使模式匹配不成立。这帮助我使用包含“：”字符的值。 – NeonD 2011-12-14 21:44:38

你可以这样做：

#!/usr/bin/perl 

use strict; 
use warnings; 

my @fields; 
open(my $fh, "<", "multi.txt") or die "Unable to open file: $!\n"; 

for (<$fh>) { 
    if (/^\s/) { 
     $fields[$#fields] .= $_;  
    } else { 
     push @fields, $_; 
    } 
} 

close $fh;

如果一行以空格开始，其追加到@fields的最后一个元素，否则将其推到数组的末尾。

另外，啜整个文件和分裂与环视：

#!/usr/bin/perl 

use strict; 
use warnings; 

$/=undef; 

open(my $fh, "<", "multi.txt") or die "Unable to open file: $!\n"; 

my @fields = split/(?<=\n)(?!\s)/, <$fh>; 

close $fh;

这不是一个推荐的方法，虽然。

来源

2011-12-14 19:51:54 flesk

固定宽度表示unpack给我。可以用正则表达式和分割进行解析，但unpack应该是更安全的选择，因为它是固定宽度数据的正确工具。

我把第一个字段的宽度设置为12，将空白间隔设置为13，这对于这些数据起作用。你可能需要改变它。模板"A12A13A*"的意思是“找到12，然后13个ASCII字符，后面是任意长度的ASCII字符”。 unpack将返回这些匹配的列表。另外，如果没有提供字符串，unpack将使用$_，这是我们在这里执行的操作。

请注意，如果第一个字段的宽度不是固定的，因为它看起来在您的示例数据中，您需要合并模板中的字段，例如， “A25A *”，然后剥离结肠。

我选择数组作为存储设备，因为我不知道您的字段名是否是唯一的。哈希将覆盖具有相同名称的字段。数组的另一个好处是它保留了数据在文件中出现的顺序。如果这些东西无关紧要，而且快速查找更重要，则可以使用散列代替。

代码：

use strict; 
use warnings; 
use Data::Dumper; 

my $last_text; 
my @array; 
while (<DATA>) { 
    # unpack the fields and strip spaces 
    my ($field, undef, $text) = unpack "A12A13A*"; 
    if ($field) { # If $field is empty, that means we have a multi-line value 
      $field =~ s/:$//;    # strip the colon 
     $last_text = [ $field, $text ]; # store data in anonymous array 
     push @array, $last_text;   # and store that array in @array 
    } else {  # multi-line values get added to the previous lines data 
     $last_text->[1] .= " $text"; 
    } 
} 

print Dumper \@array; 

__DATA__ 
field name 1:   Multiple word value. 
field name 2:   Multiple word value along 
         with multiple lines. 
field name 3:   Another multiple word 
         and multiple line value 
         with a third line

输出：

$VAR1 = [ 
      [ 
      'field name 1:', 
      'Multiple word value.' 
      ], 
      [ 
      'field name 2:', 
      'Multiple word value along with multiple lines.' 
      ], 
      [ 
      'field name 3:', 
      'Another multiple word and multiple line value with a third line' 
      ] 
     ];

来源

2011-12-15 00:08:28 TLP

您可以更改分隔符：

$/ = "\nfield name"; 

while (my $line = <FILE>) { 

    if ($line =~ /(\d+)\s+(.+)/) { 
     print "Record $1 is $2"; 
    } 
}

来源

2011-12-15 16:32:19

如何解析perl中的多行固定宽度文件？

回答

相关问题