找到文本文件中的匹配链接用php

我有一个函数来读取文本文件并与目录搜索进行交叉匹配以将描述（文本文件）与数据文件的目录索引进行数学运算。我使用了leveltensin函数来给出一些模糊逻辑，所以名字不需要100％相同，但是我遇到了一个障碍，因为我现在已经设置好了，因为当我取消注释行时在它下面搜索整个txt文件，并将每个单词与目录文件名进行比较。每700多个文件被检查700次，我很快就会耗尽内存。我需要一些方法来跳出while（！feof（$ file_handle）），当它找到一个匹配，然后找到一些方法来设置下一个传递的起点到我们停止它的线位置，所以它不是循环0-700每一次找到文本文件中的匹配链接用php

function GenerateList($titleB, $descB, $thumbB, $dirB, $patternB){ 
$outputB = "<CATEGORY name=\"$titleB\" desc=\"$descB\" thumb=\"$thumbB\">"; 
$open_error = 0; 

if (is_dir($dirB)){ 
$myDirectory = opendir($dirB); 
// get each entry 
while($entryName = readdir($myDirectory)) { 
    $dirArray[] = $entryName; 
} 

// close directory 
closedir($myDirectory); 

// count elements in array 
$indexCount = count($dirArray); 

// sort em 
sort($dirArray); 
// loop through the array of files and print them all 
if (!($text = file_get_contents("Scripts/descriptions.txt"))){$open_error = 1;} 
$results = array(); 
for($index=0; $index < $indexCount; $index++) { 
    $ext = explode(".", $dirArray[$index]); 
    $parsed_title = preg_replace ($patternB, "", $ext[0]); 
    if ((substr("$dirArray[$index]", 0, 1) != ".")&&($ext[1] == "flv")){ // don't list hidden files 

//if ($open_error == 0){ 
// $file_handle = fopen("Scripts/descriptions.txt", "rb"); 

//while (!feof($file_handle)) { 
//$line_of_text = fgets($file_handle); 
//$parts = explode('|', $line_of_text); 
/* 
echo "<PRE>"; 
echo strtolower($parts[0]); 
echo "</br>"; 
echo strtolower($parsed_title); 
echo "</br>"; 
echo "</PRE>"; 
*/ 
//if ((wordMatch(strtolower($parts[0]), strtolower($parsed_title), 2)) > 0){ 
     $outputB .= "<ITEM>"; 
     $outputB .= "<file_path>/Sources/Power Rangers/$dirB".$dirArray[$index]."</file_path>"; 
     $outputB .= "<file_width>500</file_width>"; 
     $outputB .= "<file_height>375</file_height>"; 
     $outputB .= "<file_title>".$parsed_title."</file_title>"; 
//  $outputB .= "<file_desc>".$parts[1]."</file_desc>"; 
     $outputB .= "<file_desc>test</file_desc>"; 
//  $outputB .= "<file_image>".$match_result[2]."</file_image>"; 
     $outputB .= "<file_image>$thumbB</file_image>"; 
//  $outputB .= "<featured_image>".$match_result[3]."</featured_image>"; 
     $outputB .= "<featured_image>$thumbB</featured_image>"; 
//  $outputB .= "<featured_or_not>".$parts[4]."</featured_or_not>"; 
     $outputB .= "<featured_or_not>true</featured_or_not>"; 
     $outputB .= "</ITEM>"; 
//};//if ((wordMatch($parts[0], strtolower($word), 2) > 0) 
//};//while 
//fclose($file_handle); 

//};//if ($open_error == 0) 
    };//if ((substr("$dirArray[$index]", 0, 1) != ".")&&($ext[1] == "flv")) 
};//for($index=0; $index < $indexCount; $index++) 
};//if (file_exists($dirB)) 
$outputB .= "</CATEGORY>"; 
return $outputB; 
};//function 

    function wordMatch($words, $input, $sensitivity){ 
     $shortest = -1; 
     foreach ($words as $word) { 
      $lev = levenshtein($input, $word); 
      if ($lev == 0) { 
       $closest = $word; 
       $shortest = 0; 
       break; 
      } //if 
      if ($lev <= $shortest || $shortest < 0) { 
       $closest = $word; 
       $shortest = $lev; 
      } //if 
     } //foreach 
     if($shortest <= $sensitivity){ 
      return $closest; 
     } else { 
      return 0; 
     } //if/else 
    } // function, http://php.net/manual/en/function.levenshtein.php

来源

2012-07-29 NekoLLX

你如何定义“80％”？正则表达式匹配或不匹配。 – ghoti 2012-07-29 03:01:23

多数民众赞成多数民众赞成在棘手的部分，如果$解析说是“和平爱与祸”，并匹配是“和平爱与悲哀”或“和平爱和Woe”或“和平爱和Woe.avi”它应该都是有效的 – NekoLLX 2012-07-29 03:16:37

所以...你的“80％”规则并不是一个明确的规则，因为它是你想要帮助定义的东西？你有没有考虑[fuzy logic]（http://en.wikipedia.org/wiki/Fuzzy_logic）？你不能用正则表达式来实现它，但它可能会让你更接近你的目标。此外，包括一些示例数据（以及您对它代表多少匹配的想法）可以使写出符合您要求的内容更容易。 – ghoti 2012-07-29 03:35:50

而不是一个正则表达式，可以计算两个项目之间的edit distance。然后，您的80％启发式就等于说(length-edit_distance)/length >= .8其中length是您尝试匹配的字符串的长度。

因此，如果字符串的长度为20个字符，并且您的目标的编辑距离为2，那么您会计算出(20-2)/20 == .9换句话说，该项目与目标的匹配度为90％。这比.8更高，所以你接受它作为匹配。

注意，“编辑距离”也被称为Levenshtein distance，所以你才这样做：

$len = (float) strlen($target); // Avoids integer division. 
$match = ($len-levenshtein($input, $target))/$len; 

if ($match >= 0.8) { 
    // The $input matches our $target 
}

来源

2012-07-29 04:24:13 Qsario

好主意。如果你可以包含（或指向）一些计算PHP中字符串间编辑距离的示例代码，那肯定会让你获得StackOverflow点数。 :) – ghoti 2012-07-29 13:51:58

还有一个问题，搜索一个链接匹配的启发式txt文件，我真的想避免加载到一个变量或数组的废弃TXT只丢弃它的大部分，因为我必须通过它约700次为每个条目填充xml文件 – NekoLLX 2012-07-29 23:43:31

我认为编辑距离已经是PHP函数了吗？ http://php.net/manual/en/function.levenshtein.php至于其余的，不是所有的链接都以http：//或https：//开头？你可以把所有看起来像链接的东西都拉出来，然后做你的模糊匹配的东西。 – Qsario 2012-07-30 17:33:10

找到文本文件中的匹配链接用php

回答

相关问题