2011-12-13 71 views
5

这里我有两种方法使用str_replace替换给定短语中的字符串。PHP中str_replace的性能

// Method 1 
$phrase = "You should eat fruits, vegetables, and fiber every day."; 
$healthy = array("fruits", "vegetables", "fiber"); 
$yummy = array("pizza", "beer", "ice cream"); 
$phrase = str_replace($healthy, $yummy, $phrase); 

// Method 2 
$phrase = "You should eat fruits, vegetables, and fiber every day."; 
$phrase = str_replace("fruits", "pizza", $phrase); 
$phrase = str_replace("vegetables", "beer", $phrase); 
$phrase = str_replace("fiber", "ice cream", $phrase); 

哪种方法是更有效的(在执行时间方面使用&资源)?

假设实际的词组要长得多(例如50,000个字符),并且要替换的词有更多的对。

我在想什么是方法2调用str_replace 3次,这将花费更多的函数调用;另一方面,方法1创建2个数组,而str_replace需要在运行时解析2个数组。

+0

方法1更快。 – djot

+1

既不是一个好的选择,如果你有一个很长的字符串,反复需要str_replace,为什么不在str_replace之后保存结果呢? – ajreal

+0

如果您在循环中一遍又一遍地创建ARRAYs healty和yummy,它会变慢,而不是将它们放在外面。 – djot

回答

5

我宁愿使用方法1作为其更清洁和更有条理的方法1提供了使用其他来源对的机会,例如:数据库中的坏词表。方法2将需要排序的另一环..

<?php 
$time_start = microtime(true); 
for($i=0;$i<=1000000;$i++){ 
    // Method 1 
    $phrase = "You should eat fruits, vegetables, and fiber every day."; 
    $healthy = array("fruits", "vegetables", "fiber"); 
    $yummy = array("pizza", "beer", "ice cream"); 
    $phrase = str_replace($healthy, $yummy, $phrase); 
} 
$time_end = microtime(true); 
$time = $time_end - $time_start; 
echo "Did Test 1 in ($time seconds)\n<br />"; 



$time_start = microtime(true); 
for($i=0;$i<=1000000;$i++){ 
    // Method2 
    $phrase = "You should eat fruits, vegetables, and fiber every day."; 
    $phrase = str_replace("fruits", "pizza", $phrase); 
    $phrase = str_replace("vegetables", "beer", $phrase); 
    $phrase = str_replace("fiber", "ice cream", $phrase); 

} 
$time_end = microtime(true); 
$time = $time_end - $time_start; 
echo "Did Test 2 in ($time seconds)\n"; 
?> 

测试了1(3.6321988105774秒)

测试了2英寸(2.8234610557556秒)


编辑:在进一步测试字符串重复到50k,减少迭代和来自爱滋病的建议,差别非常小。

<?php 
$phrase = str_repeat("You should eat fruits, vegetables, and fiber every day.",50000); 
$healthy = array("fruits", "vegetables", "fiber"); 
$yummy = array("pizza", "beer", "ice cream"); 

$time_start = microtime(true); 
for($i=0;$i<=10;$i++){ 
    // Method 1 
    $phrase = str_replace($healthy, $yummy, $phrase); 
} 
$time_end = microtime(true); 
$time = $time_end - $time_start; 
echo "Did Test 1 in ($time seconds)\n<br />"; 



$time_start = microtime(true); 
for($i=0;$i<=10;$i++){ 
    // Method2 
    $phrase = str_replace("fruits", "pizza", $phrase); 
    $phrase = str_replace("vegetables", "beer", $phrase); 
    $phrase = str_replace("fiber", "ice cream", $phrase); 

} 
$time_end = microtime(true); 
$time = $time_end - $time_start; 
echo "Did Test 2 in ($time seconds)\n"; 
?> 

在有没有(1.1450328826904秒)测试1

在有没有(1.3119208812714秒)测试2

+0

但您的测试显示方法2性能更好? – ajreal

+0

是的,但是id牺牲了1mil迭代0.9秒的分数以获得更好的编码和可伸缩性。 –

+1

我可以建议你将数组声明放在循环之外吗? – ajreal

3

即使老了,这个基准是不正确。

感谢匿名用户:

“这个测试是错误的,因为当测试3开始$短语是使用测试2的结果,其中有什么可替代

当我添加$。在测试3之前,结果是:测试1在(4.3436799049377秒)中测试2在(5.7581660747528秒)中测试3在(7.5069718360901秒)中“

 <?php 
     $time_start = microtime(true); 

     $healthy = array("fruits", "vegetables", "fiber"); 
     $yummy = array("pizza", "beer", "ice cream"); 

     for($i=0;$i<=1000000;$i++){ 
      // Method 1 
      $phrase = "You should eat fruits, vegetables, and fiber every day."; 
      $phrase = str_replace($healthy, $yummy, $phrase); 
     } 
     $time_end = microtime(true); 
     $time = $time_end - $time_start; 
     echo "Did Test 1 in ($time seconds)<br /><br />"; 



     $time_start = microtime(true); 
     for($i=0;$i<=1000000;$i++){ 
      // Method2 
      $phrase = "You should eat fruits, vegetables, and fiber every day."; 
      $phrase = str_replace("fruits", "pizza", $phrase); 
      $phrase = str_replace("vegetables", "beer", $phrase); 
      $phrase = str_replace("fiber", "ice cream", $phrase); 

     } 
     $time_end = microtime(true); 
     $time = $time_end - $time_start; 
     echo "Did Test 2 in ($time seconds)<br /><br />"; 




     $time_start = microtime(true); 
     for($i=0;$i<=1000000;$i++){ 
       foreach ($healthy as $k => $v) { 
        if (strpos($phrase, $healthy[$k]) === FALSE) 
        unset($healthy[$k], $yummy[$k]); 
       }           
       if ($healthy) $new_str = str_replace($healthy, $yummy, $phrase); 

     } 
     $time_end = microtime(true); 
     $time = $time_end - $time_start; 
     echo "Did Test 3 in ($time seconds)<br /><br />"; 

     ?> 

Test Test 1 in(3.5785729885101 seco NDS)

测试了2英寸(3.8501658439636秒)

测试了3(0.13844394683838秒)

1

@djot你有

<?php 
    foreach ($healthy as $k => $v) { 
     if (strpos($phrase, $healthy[$k]) === FALSE) 
      unset($healthy[$k], $yummy[$k]); 
     } 

这里的错误,我们有一个固定的版本和更好/简单的新测试4

<?php 
$time_start = microtime(true); 

     $healthy = array("fruits", "vegetables", "fiber"); 
     $yummy = array("pizza", "beer", "ice cream"); 

     for($i=0;$i<=1000000;$i++){ 
      // Method 1 
      $phrase = "You should eat fruits, vegetables, and fiber every day."; 
      $phrase = str_replace($healthy, $yummy, $phrase); 
     } 
     $time_end = microtime(true); 
     $time = $time_end - $time_start; 
     echo "Did Test 1 in ($time seconds)". PHP_EOL. PHP_EOL; 



     $time_start = microtime(true); 
     for($i=0;$i<=1000000;$i++){ 
      // Method2 
      $phrase = "You should eat fruits, vegetables, and fiber every day."; 
      $phrase = str_replace("fruits", "pizza", $phrase); 
      $phrase = str_replace("vegetables", "beer", $phrase); 
      $phrase = str_replace("fiber", "ice cream", $phrase); 

     } 
     $time_end = microtime(true); 
     $time = $time_end - $time_start; 
     echo "Did Test 2 in ($time seconds)" . PHP_EOL. PHP_EOL; 




     $time_start = microtime(true); 
     for($i=0;$i<=1000000;$i++){ 
      $a = $healthy; 
      $b = $yummy; 
       foreach ($healthy as $k => $v) { 
        if (strpos($phrase, $healthy[$k]) === FALSE) 
        unset($a[$k], $b[$k]); 
       }           
       if ($a) $new_str = str_replace($a, $b, $phrase); 

     } 
     $time_end = microtime(true); 
     $time = $time_end - $time_start; 
     echo "Did Test 3 in ($time seconds)". PHP_EOL. PHP_EOL; 



     $time_start = microtime(true); 
     for($i=0;$i<=1000000;$i++){ 
      $ree = false; 
      foreach ($healthy as $k) { 
       if (strpos($phrase, $k) !== FALSE) { //something to replace 
        $ree = true; 
        break; 
       } 
      }           
      if ($ree === true) { 
       $new_str = str_replace($healthy, $yummy, $phrase); 
      } 
     } 
     $time_end = microtime(true); 
     $time = $time_end - $time_start; 
     echo "Did Test 4 in ($time seconds)". PHP_EOL. PHP_EOL; 

测试1在(0.38219690322876 seco nds)

Test Test 2 in(0。42352104187012秒)

的确在测试3(0.47777700424194秒)

测试了4(0.19691610336304秒)

0

虽然没有直接问的问题,OP确实状态:

假设真实短语长得多(例如50,000个字符),并且 要替换的单词有更多的单词对。

在这种情况下,如果你并不需要(或希望)替换内替换,它可能是更有效的使用preg_replace_callback解决方案,使整个字符串只对每一对处理一次,一次也没有的替代品。

这里是一个通用函数,在我的情况下,使用1.5Mb字符串和〜20,000对替换的速度快了10倍左右,不过由于“正则表达式太大”错误,需要将替换分割成块已经在替代品中进行了不确定的替换(但在我的特殊情况下,这是不可能的)。

在我的特殊情况下,我能够进一步优化大约100倍的性能增益,因为我的搜索字符串都遵循特定模式。 (Windows 7 32位上的PHP版本7.1.11)。

function str_replace_bulk($search, $replace, $subject, &$count = null) { 
    // Assumes $search and $replace are equal sized arrays 
    $lookup = array_combine($search, $replace); 
    $result = preg_replace_callback(
    '/' . 
     implode('|', array_map(
     function($s) { 
      return preg_quote($s, '/'); 
     }, 
     $search 
    )) . 
    '/', 
    function($matches) use($lookup) { 
     return $lookup[$matches[0]]; 
    }, 
    $subject, 
    -1, 
    $count 
); 
    if (
    $result !== null || 
    count($search) < 2 // avoid infinite recursion on error 
) { 
    return $result; 
    } 
    // With a large number of replacements (> ~2500?), 
    // PHP bails because the regular expression is too large. 
    // Split the search and replacements in half and process each separately. 
    // NOTE: replacements within replacements may now occur, indeterminately. 
    $split = (int)(count($search)/2); 
    error_log("Splitting into 2 parts with ~$split replacements"); 
    $result = str_replace_bulk(
    array_slice($search, $split), 
    array_slice($replace, $split), 
    str_replace_bulk(
     array_slice($search, 0, $split), 
     array_slice($replace, 0, $split), 
     $subject, 
     $count1 
    ), 
    $count2 
); 
    $count = $count1 + $count2; 
    return $result; 
}