2010-12-07 76 views
8

我遇到过在OCR识别的文本中匹配字符串的问题,并找到它的位置,考虑到可能会出现任意错误,缺失或额外的容错字符。结果应该是最佳匹配位置,可能(不一定)匹配子字符串的长度。如何在模糊匹配的字符串中找到子字符串的位置

例如:

String: 9912, 1.What is your name? 
Substring: 1. What is your name? 
Tolerance: 1 
Result: match on character 7 

String: Where is our caat if any? 
Substring: your cat 
Tolerance: 2 
Result: match on character 10 

String: Tolerance is t0o h1gh. 
Substring: Tolerance is too high; 
Tolerance: 1 
Result: no match 

我试图适应莱文施泰因的算法,但它并不适用于子正常工作,不回位。

Delphi中的算法将是首选,但任何实现或伪逻辑都可以。

回答

8

下面是一个有效的递归实现,但可能不够快。最糟糕的情况是无法找到匹配项,并且“What”中的最后一个字符在Where中的每个索引处都匹配。在这种情况下,该算法将对Where中的每个字符进行Length(What)-1 + Tolerance比较,以及每个Tolerance的一个递归调用。既然公差和什么是常量的长度,我会说算法是O(n)。它的性能会随着“What”和“Where”的长度而线性降低。

function BrouteFindFirst(What, Where:string; Tolerance:Integer; out AtIndex, OfLength:Integer):Boolean; 
    var i:Integer; 
     aLen:Integer; 
     WhatLen, WhereLen:Integer; 

    function BrouteCompare(wherePos, whatPos, Tolerance:Integer; out Len:Integer):Boolean; 
    var aLen:Integer; 
     aRecursiveLen:Integer; 
    begin 
     // Skip perfect match characters 
     aLen := 0; 
     while (whatPos <= WhatLen) and (wherePos <= WhereLen) and (What[whatPos] = Where[wherePos]) do 
     begin 
     Inc(aLen); 
     Inc(wherePos); 
     Inc(whatPos); 
     end; 
     // Did we find a match? 
     if (whatPos > WhatLen) then 
     begin 
      Result := True; 
      Len := aLen; 
     end 
     else if Tolerance = 0 then 
     Result := False // No match and no more "wild cards" 
     else 
     begin 
      // We'll make an recursive call to BrouteCompare, allowing for some tolerance in the string 
      // matching algorithm. 
      Dec(Tolerance); // use up one "wildcard" 
      Inc(whatPos); // consider the current char matched 
      if BrouteCompare(wherePos, whatPos, Tolerance, aRecursiveLen) then 
      begin 
       Len := aLen + aRecursiveLen; 
       Result := True; 
      end 
      else if BrouteCompare(wherePos + 1, whatPos, Tolerance, aRecursiveLen) then 
      begin 
       Len := aLen + aRecursiveLen; 
       Result := True; 
      end 
      else 
      Result := False; // no luck! 
     end; 
    end; 

    begin 

    WhatLen := Length(What); 
    WhereLen := Length(Where); 

    for i:=1 to Length(Where) do 
    begin 
     if BrouteCompare(i, 1, Tolerance, aLen) then 
     begin 
     AtIndex := i; 
     OfLength := aLen; 
     Result := True; 
     Exit; 
     end; 
    end; 

    // No match found! 
    Result := False; 

    end; 

我用下面的代码来测试功能:

procedure TForm18.Button1Click(Sender: TObject); 
var AtIndex, OfLength:Integer; 
begin 
    if BrouteFindFirst(Edit2.Text, Edit1.Text, ComboBox1.ItemIndex, AtIndex, OfLength) then 
    Label3.Caption := 'Found @' + IntToStr(AtIndex) + ', of length ' + IntToStr(OfLength) 
    else 
    Label3.Caption := 'Not found'; 
end; 

对于情况:

​​

它示出了字符9匹配,长度为6的对于其他两个例子给出了预期的结果。

+0

您的解决方案正是我所寻找的,谢谢。 – too 2010-12-07 12:42:40