2012-08-17 85 views
-3

FILE1文件具有数千行,终止模式为_Pattern1。perl:基于模式匹配的字符串提取

第二个文件也有几千行具有相同终止模式_Pattern1的行。

我现在必须:

  • 读FILE1逐行

  • 查找出来,如果行有任何字符串_Pattern1

  • 提取字符串,并将其存储到一个变量终止

  • 打开文件2并逐行读取

  • 查找出来,如果从FILE2刚读行包含存储在变量上

字符串这是如何在Perl做什么?

EDIT2:

还好吧,有一点谷歌搜索,并参考链接低于入伍,我解决我的问题。 这是代码片段。

#!/usr/bin/perl 
use strict; 
use warnings; 

my $OriginalHeader=$ARGV[0]; ## Source file 
my $GeneratedHeader=$ARGV[1];## File to compare against 
my $DeltaHeader=$ARGV[2]; ## File to store misses 

my $MatchingPattern="_Pos"; 
my $FoundPattern; 

open FILE1, $OriginalHeader or die $!; 
open FILE2, $GeneratedHeader or die $!; 
open (FILE3, ">$DeltaHeader") or die $!; 

my $lineFromOriginalHeader; 
my $lineFromGeneratedHeader; 
my $TotalMacrosExamined = 0; 
my $TotalMacrosMissed = 0; 

while($lineFromOriginalHeader=<FILE1>) 
{ 
if($lineFromOriginalHeader =~ /$MatchingPattern/) 
    { 
    my $index = index($lineFromOriginalHeader,$MatchingPattern); 

    my $BackIndex = $index; 
    my $BackIndexStart = $index; 

    $BackIndex = $BackIndex - 1; 

    ## Use this while loop to extract the substring. 
    while (1) 
    { 
     my $ExtractedChar = substr($lineFromOriginalHeader,$BackIndex,1); 
     if ($ExtractedChar =~//) 
     { 
     $FoundPattern = substr($lineFromOriginalHeader,$BackIndex + 1,$BackIndexStart + 3 - 
                       $BackIndex); 
     print "Identified $FoundPattern \n"; 
     $TotalMacrosExamined = $TotalMacrosExamined + 1; 
     ##Skip the next line 
     $lineFromOriginalHeader = <FILE1>; 
     last;  
     } 
    else 
    { 
     $BackIndex = $BackIndex - 1; 
    } 

    } ##while(1) 

## We now look for $FoundPattern in FILE2 
while ($lineFromGeneratedHeader = <FILE2>) 
{ 
    if (index($lineFromGeneratedHeader,$FoundPattern)!= -1) 
    { 
    ##Pattern found. Reset file pointer and break out of while loop 
    seek FILE2,0,0; 
    last; 
    } 
    else 
    { 
    if (eof(FILE2) == 1) 
     {   
     print FILE3 "Generated header misses $FoundPattern\n"; 
     $TotalMacrosMissed = $TotalMacrosMissed + 1; 
     seek FILE2,0,0; 
     last;  
     } 
    } 
} ##while(1) 

} 
else 
{ 
    ##NOP 
} 
} ##while (linefromoriginalheader) 

close FILE1; 
close FILE2; 
close FILE3; 
print "Total number of bitfields examined = $TotalMacrosExamined\n"; 
print "Number of macros obsolete = $TotalMacrosMissed\n"; 
+1

您介绍的步骤相当不错。你是Perl的新手,还是不管语言编程?如果您刚刚接触Perl,几乎所有描述的内容都可以在http://perldoc.perl.org/perlintro.html找到。一旦您有一些代码可以显示我们可以帮助您处理棘手的部分。 – DavidO 2012-08-17 04:56:35

+0

有多种方法可以做到这一点,这里有一个:'$ perl -ne'exec q; perl ;,“-ne”,q $ print(/\Q$.$1.q;/?"$。YES “:$。.q \; NO \;);;”file2“if m; ^(。*)_ pat1;' file1'这应该做的伎俩,减去几个陷阱。我不知道这是否编译,但我喜欢它的外观。请注意使用'exec'作为循环终止符。我甚至不*有*分配给一个变量:-)有,当然,更简单的方法 - [?你尝试过什么(http://whathaveyoutried.com) – amon 2012-08-17 05:32:33

+0

当使用正则表达式,我们可以声明*通过将所有我们想要用parens捕捉的东西包含在捕捉组*中。你的正则表达式看起来像'/($ MatchingPattern)/'。该组的内容将在特殊变量“$ 1”中,直到您执行另一个正则表达式匹配。 [Perl的正则表达式教程](http://perldoc.perl。org/perlretut.html#Extracting-matches)在学习perl正则表达式时可能会派上用场。 – amon 2012-08-17 06:20:06

回答

0

编程在C我所有的生活中,我googled下面的perl结构的使用,并写了一个C类似的程序。这对我来说完美无瑕。 :-)

编辑:这是为了阐明为什么我必须跳过下面算法中的一行。在第二个文件中检索并稍后搜索的模式发生在两个连续的行上。因此,足以可靠地检测到它的第一次发生。也是一个挑剔的问题,总是保证包含该模式的子字符串始终是该行上的第二个子字符串。

e.g的#define Something_Pos(某个值)

#!/usr/bin/perl 
use strict; 
use warnings; 

my $OriginalHeader=$ARGV[0]; 
my $GeneratedHeader=$ARGV[1]; 
my $DeltaHeader=$ARGV[2]; 

my $MatchingPattern="_Pos"; 
my $FoundPattern; 

open FILE1, $OriginalHeader or die $!; 
open FILE2, $GeneratedHeader or die $!; 
open (FILE3, ">$DeltaHeader") or die $!; 

my $lineFromOriginalHeader; 
my $lineFromGeneratedHeader; 
my $TotalMacrosExamined = 0; 
my $TotalMacrosMissed = 0; 

while($lineFromOriginalHeader=<FILE1>) 
{ 
if($lineFromOriginalHeader =~ /$MatchingPattern/) 
{ 
    my $index = index($lineFromOriginalHeader,$MatchingPattern); 

    my $BackIndex = $index; 
    my $BackIndexStart = $index; 

    $BackIndex = $BackIndex - 1; 

    ## Use this while loop to extract the substring. 
    while (1) 
    { 
    my $ExtractedChar = substr($lineFromOriginalHeader,$BackIndex,1); 
    if ($ExtractedChar =~//) 
    { 
    $FoundPattern = substr($lineFromOriginalHeader,$BackIndex + 1,$BackIndexStart + 3 - 
                       $BackIndex); 
    print "Identified $FoundPattern \n"; 
    $TotalMacrosExamined = $TotalMacrosExamined + 1; 
    ##Skip the next line 
    $lineFromOriginalHeader = <FILE1>; 
    last;  
    } 
    else 
    { 
    $BackIndex = $BackIndex - 1; 
    } 

} ##while(1) 

## We now look for $FoundPattern in FILE2 
while ($lineFromGeneratedHeader = <FILE2>) 
{ 
##print "Read the following line from FILE2: $lineFromGeneratedHeader\n"; 

    if (index($lineFromGeneratedHeader,$FoundPattern)!= -1) 
    { 
    ##Pattern found. Close the file and break out of while loop 
    seek FILE2,0,0; 
    last; 
    } 
    else 
    { 
    if (eof(FILE2) == 1) 
     {   
     print FILE3 "Generated header misses $FoundPattern\n"; 
     $TotalMacrosMissed = $TotalMacrosMissed + 1; 
     seek FILE2,0,0; 
     last;  
     } 
    } 
} ##while(1) 

} 
else 
{ 

} 
} ##while (linefromoriginalheader) 

close FILE1; 
close FILE2; 
close FILE3; 
print "Total number of bitfields examined = $TotalMacrosExamined\n"; 
print "Number of macros obsolete = $TotalMacrosMissed\n"; 
+0

这段代码好像可以工作,但是我已经添加了一个接受这段代码的答案,并使它更加Perlish。 http://stackoverflow.com/a/12012931/468327 – 2012-08-17 21:09:06

0

就在让你代码佩尔利第一晋级。其实还有很多可以做的,包括$some_var通常用于Perl中的$SomeVar,但我没有得到那么多。

#!/usr/bin/perl 
use strict; 
use warnings; 

my ($OriginalHeader, $GeneratedHeader, $DeltaHeader) = @ARGV; 
my $MatchingPattern=qr/(\S*_Pos)/; # all non-whitespace terminated by _Pos 

open my $file1, '<', $OriginalHeader or die $!; 
open my $file2, '<', $GeneratedHeader or die $!; 
open my $file3, '>', $DeltaHeader  or die $!; 

my $TotalMacrosExamined = 0; 
my $TotalMacrosMissed = 0; 

while(my $lineFromOriginalHeader=<$file1>) { 
    next unless $lineFromOriginalHeader =~ $MatchingPattern; 
    my $FoundPattern = $1; # matched string 

    print "Identified $FoundPattern \n"; 
    $TotalMacrosExamined++; 

    ##Skip the next line 
    <$file1>; 

    ## We now look for $FoundPattern in FILE2 
    my $match_found = 0; 
    while (my $lineFromGeneratedHeader = <$file2>) { 
    if (index($lineFromGeneratedHeader,$FoundPattern)!= -1) { 
     ##Pattern found. Close the file and break out of while loop 
     $match_found++; 
     last; 
    } 
    } 

    unless ($match_found) { 
    print $file3 "Generated header misses $FoundPattern\n"; 
    $TotalMacrosMissed++; 
    } 

    seek $file2,0,0; 

} 

print "Total number of bitfields examined = $TotalMacrosExamined\n"; 
print "Number of macros obsolete = $TotalMacrosMissed\n";