PHP - 为什么比较两个整个长（相同）的字符串比比较每个字符的第一个字符要快得多？

对于我的一个应用程序，我认为比较2个字符串的第一个字符比比较整个字符串的速度要快。例如，如果我知道只有两个可能的字符串（在一组字符串中）可以以同一个字母开始（比如'q'），如果是这样，它们是相同的字符串，那么我可以写一个比较像这样：PHP - 为什么比较两个整个长（相同）的字符串比比较每个字符的第一个字符要快得多？

if ($stringOne[0] === $stringTwo[0]) $qString = true;

代替：

if ($stringOne === $stringTwo) $qString = true;

但我最近写了一些剧本基准，似乎我假设是错误的。也就是说，它看起来是第二次比较比第二次比平均快2-4倍。我的基准是这样的：

$x = 'A really really looooooooooooong string'; 
$y = 'A really really looooooooooooong string'; 

$timeArray = array(); 

//Method 1, two-four times faster than Method 2 
for($i = 0; $i < 100; $i++) { 
    $t1 = microtime(true); 
    for($j = 0; $j < 100000; $j++) { 
     if ($x === $y) continue; 
    } 
    $t2 = microtime(true); 
    $timeArray[] = $t2 - $t1; 
} 

echo array_sum($timeArray)/100;//average time is echoed 

//Method 2 
for($i = 0; $i < 100; $i++) { 
    $t1 = microtime(true); 
    for($j = 0; $j < 100000; $j++) { 
     if ($x[0] === $y[0]) continue; 
    } 
    $t2 = microtime(true); 
    $timeArray[] = $t2 - $t1; 
} 

echo array_sum($timeArray)/100;//average time is echoed

我想我假设，因为每个字符串$ x和$ y是在内存中，然后每一个的第一个字符是在内存中也是一样，比较会更快。

为什么整串比较更快？从每个字符串中提取第一个字符进行比较是否有“成本”？

更新：即使在每个外部循环迭代中生成新字符串并进行比较，或者起始字符串相同或不相同时，Method1对于我来说仍然比Method2快。

//Method 1 faster than Method 2 by 2-3 times 
for($i = 0; $i < 100; $i++) { 
    $t1 = microtime(true); 
    $a = $x . $i; 
    $b = $y . $i; 
    for($j = 0; $j < 100000; $j++) { 
     if ($a === $b) continue; 
    } 
    $t2 = microtime(true); 
    $timeArray[] = $t2 - $t1; 
} 

//Method 2 
for($i = 0; $i < 100; $i++) { 
    $t1 = microtime(true); 
    $a = $x . $i; 
    $b = $y . $i; 
    for($j = 0; $j < 100000; $j++) { 
     if ($a[0] === $b[0]) continue; 
    } 
    $t2 = microtime(true); 
    $timeArray[] = $t2 - $t1; 
}

如果严格的不等价比较两个而不是全等的

//Method 1 faster than Method 2 by 1.5-2 times, but now less of a difference 
for($i = 0; $i < 100; $i++) { 
    $t1 = microtime(true); 
    $a = $x . $i; 
    $b = $y . $i; 
    for($j = 0; $j < 100000; $j++) { 
     if ($a !== $b) continue; // using inequivalence this time 
    } 
    $t2 = microtime(true); 
    $timeArray[] = $t2 - $t1; 
} 

//Method 2 
for($i = 0; $i < 100; $i++) { 
    $t1 = microtime(true); 
    $a = $x . $i; 
    $b = $y . $i; 
    for($j = 0; $j < 100000; $j++) { 
     if ($a[0] !== $b[0]) continue; // using inequivalence this time 
    } 
    $t2 = microtime(true); 
    $timeArray[] = $t2 - $t1; 
}

来源

2016-05-16 The One and Only ChemistryBlob

什么版本的PHP？字符串是硬编码的，从文件读取还是动态生成？ – axiac

最有可能是由于zval引用效应 – hindmost

这个用例只有硬编码...在我的应用程序字符串动态生成 –

静态字符串，就像在你的脚本也得到了相同的结果，will be interned（见String Interning在维基百科的详细说明什么这意味着）。

基本上这意味着相同的字符串只会在内存中存储一次。当PHP进行比较时，它会立即看到两个字符串都指向内存中的同一个对象，不需要做任何进一步的检查。比较字符串中的单个字符很可能不会从这种优化中受益，这就是为什么它们可能需要更长时间。

很有可能还有其他因素发挥作用，但那将是一个主要因素。尝试构建动态的字符串中的一个或两个，看看有多少改变，像这样的代码，结果的变化：

$x = base64_encode(base64_decode('A really really looooooooooooong string'));

正如评论答应了，这里是一个版本的脚本，它覆盖两个字符串实习以及任何可能正在使用的平等缓存。

我在这里得到的结果表明第二种方法速度稍快。

<?php 
$runs = 1000000; 
$input_string_a = "A really really looooooooooooong string"; 
$input_string_b = "B really really looooooooooooong string"; 

$total_time = 0; 
for($i=0; $i<$runs; $i++) { 
    $a = substr($input_string_a, 0); 
    $b = substr($input_string_b, 0); 
    $start = microtime(true); 
    if($a === $b) { 
     if(false) break; 
    } 
    $end = microtime(true); 
    $total_time += $end - $start; 
} 

echo $total_time."\n"; 

$total_time = 0; 
for($i=0; $i<$runs; $i++) { 
    $a = substr($input_string_a, 0); 
    $b = substr($input_string_b, 0); 
    $start = microtime(true); 
    if($a[0] === $b[0]) { 
     if(false) break; 
    } 
    $end = microtime(true); 
    $total_time += $end - $start; 
} 

echo $total_time."\n";

来源

2016-05-16 20:55:28 Chris

我更新$ x为md5（'原始字符串'）和$ y到是'原始字符串'。时间（）和结果基本相同....非等价字符串我知道但仍然是相同的结果 –

这里是我得到的结果。对于你的原始代码（编译时匹配的字符串）'0.008，0.026'。我的修改（字符串匹配，不在编译时确定）'0.021 0.030'。如果字符串在编译时和运行时都不匹配（第一个字符），我也得到了相同的结果。 – Chris

这表明字符串interning是一个重要的因素，但可能还有其他字符串比较的优化使其更快。 – Chris

PHP - 为什么比较两个整个长（相同）的字符串比比较每个字符的第一个字符要快得多？

回答

相关问题