2015-11-10 73 views
0
$string1 = "peachbananaapplepear"; 
$string2 = "juicenaapplewatermelonpear"; 

我想知道包含单词“apple”的最长的公共子字符串是什么。perl:字符串匹配找到最长的子字符串

$string2 =~ m/.+apple.+/; 
print $string2; 

所以我使用匹配运算符,.+之前和关键字“苹果”后匹配任何字符。当我打印$string2时,它不会返回naapple,而是返回原来的$string2

回答

1

下面是一种方法。首先获取'apple'出现在字符串中的位置。对于string1中的每个位置,查看string2中的所有位置。 从左到右看看共同点距离初始位置有多远。

$string1 = "peachbananaapplepear12345applegrapeapplebcdefghijk"; 
$string2 = "juicenaapplewatermelonpearkiwi12345applebcdefghijkberryapple"; 

my $SearchFor="apple"; 
my $SearchStrLen = length($SearchFor); 

# Get locations in first string where the search term appears 
my @FirstPositions = getPostions($string1); 
# Get locations in second string where the search term appears 
my @SecondPositions = getPostions($string2); 

CheckForMaxMatch(); 

sub getPostions 
{ 
    my $GivenString = shift; 
    my @Positions; 
    my $j=0; 
    for (my $i=0; $i < length($GivenString); $i += ($SearchStrLen+1)) 
    { 
    $j = index($GivenString, $SearchFor, $i); 
    if ($j == -1) { 
    last; 
    } 
    push (@Positions, $j); 
    $i = $j; 
    } 

    return @Positions; 
} 

sub CheckForMaxMatch 
{ 
    my $MaxLeft=0; 
    # From the location of 'apple', look to the left and right 
    # to see how far the characters are same 
    for my $i (@FirstPositions) { 
    for my $j (@SecondPositions) { 
     my $LeftMatchPos = getMaxMatch($i, $j, -1); 
     my $RightMatchPos = getMaxMatch($i, $j, 1); 

     if (($RightMatchPos - $LeftMatchPos) > ($MaxRight - $MaxLeft)) { 
     $MaxLeft = $LeftMatchPos; 
     $MaxRight = $RightMatchPos; 
     } 
    } 
    } 

    my $LongestSubString = substr($string1, $MaxLeft, $MaxRight-$MaxLeft); 
    print "Longest common substring is: $LongestSubString\n"; 
    print "It begins at $MaxLeft and ends at $MaxRight in string1\n"; 
} 

sub getMaxMatch 
{ 
    my $i= shift; 
    my $j= shift; 
    my $direction= shift; 

    my $k = ($direction >= 1 ? $SearchStrLen : 0); 

    my $FirstChar = substr($string1, $i+($k * $direction), 1); 
    my $SecondChar = substr($string2, $j+($k * $direction), 1); 

    for (; $FirstChar && $SecondChar; $k++) 
    { 
    $FirstChar = substr($string1, $i+($k * $direction), 1); 
    $SecondChar = substr($string2, $j+($k * $direction), 1); 
    if ($FirstChar ne $SecondChar) { 
     $direction < 1 ? $k-- : ""; 
     my $pos = ($k ? ($i + $k * $direction) : $i); 
     return $pos; 
    } 
    } 

    return $i; 
} 
0

=〜运算符不会重新分配$ string2的值。试试这个:

$string2 =~ m/(.+apple.+)/; 
my $match = $1; 
print $match 
+0

'$ 1'可以包含该代码中的任何内容(包括“橙色”)。我修正了它,但鲍罗丁没有解释就恢复了修复。 – ikegami

+1

真的......我通常会在询问$ 1的价值之前检查以确保比赛成功。 –

+0

即使有修复,它也不会做OP所要求的。它只是检查'$ string2'是否包含'apple'。 – ikegami

0

基础上general algorithm,但不是唯一跟踪当前运行(@l)的长度,而是它是否包含关键字(@k)。只有包含关键字的运行才被考虑为最长运行时间。

use strict; 
use warnings; 
use feature qw(say); 

sub find_substrs { 
    our $s; local *s = \shift; 
    our $key; local *key = \shift; 

    my @positions; 
    my $position = -1; 
    while (1) { 
     $position = index($s, $key, $position+1); 
     last if $position < 0; 

     push @positions, $position; 
    } 

    return @positions; 
} 

sub lcsubstr_which_include { 
    our $s1; local *s1 = \shift; 
    our $s2; local *s2 = \shift; 
    our $key; local *key = \shift; 

    my @key_starts1 = find_substrs($s1, $key) 
     or return; 

    my @key_starts2 = find_substrs($s2, $key) 
     or return; 

    my @is_key_start1; $is_key_start1[$_] = 1 for @key_starts1; 
    my @is_key_start2; $is_key_start2[$_] = 1 for @key_starts2; 

    my @s1 = split(//, $s1); 
    my @s2 = split(//, $s2); 

    my $length = 0; 
    my @rv; 
    my @l = (0) x (@s1 + 1); # Last ele is read when $i1==0. 
    my @k = (0) x (@s1 + 1); # Same. 
    for my $i2 (0..$#s2) { 
     for my $i1 (reverse 0..$#s1) { 
     if ($s1[$i1] eq $s2[$i2]) { 
      $l[$i1] = $l[$i1-1] + 1; 
      $k[$i1] = $k[$i1-1] || ($is_key_start1[$i1] && $is_key_start2[$i2]); 

      if ($k[$i1]) { 
       if ($l[$i1] > $length) { 
        $length = $l[$i1]; 
        @rv = [ $i1, $i2, $length ]; 
       } 
       elsif ($l[$i1] == $length) { 
        push @rv, [ $i1, $i2, $length ]; 
       } 
      } 
     } else { 
      $l[$i1] = 0; 
      $k[$i1] = 0; 
     } 
     } 
    } 

    for (@rv) { 
     $_->[0] -= $length; 
     $_->[1] -= $length; 
    } 

    return @rv; 
} 

{ 
    my $s1 = "peachbananaapplepear"; 
    my $s2 = "juicenaapplewatermelonpear"; 
    my $key = "apple"; 

    for (lcsubstr_which_include($s1, $s2, $key)) { 
     my ($s1_pos, $s2_pos, $length) = @$_; 
     say substr($s1, $s1_pos, $length); 
    } 
} 

这个解决方案在O(NM)中,意思是它的尺寸非常好(因为它的功能)。

+0

感谢您的详细解答。我想知道...如果我把2个字符串放入数组中'@ str1 = split(//,$ string1);'和'@ str2 = split(//,$ string2);'它会起作用吗?然后找到两个数组的交集?这会简化代码吗? – Adrian

+0

我不明白如何获取这两个字符串通用的字母将有所帮助。 – ikegami