正则表达式匹配空白，但跳过部分

我明白了，因为正则表达式本质上是无状态的，要实现复杂的匹配而不诉诸补充应用逻辑是相当困难的，但是我很想知道下面是否可能。正则表达式匹配空白，但跳过部分

匹配所有的空白，很容易：\s+

但跳过某些分隔符之间的空白，在我的情况 ~~<pre>和 </pre>~~ 字nostrip。

是否有任何技巧可以实现这个目标？我正在考虑沿着两个单独的比赛，一个为所有空白，一个为 ~~<pre>块~~ nostrip部分，并以某种方式否定后者从前者。

"This is some text NOSTRIP this is more text NOSTRIP some more text." 
// becomes 
"ThisissometextNOSTRIP this is more text NOSTRIPsomemoretext."

给出标签 NOSTRIP部分是无关紧要的，我不会试图解析 ~~的嵌套树~~ HTML或任何，只是整理一个文本文件，但节省了 ~~<pre> blocks~~ nostrip部分中的空格，原因很明显。

（更好？）

这是最终我跟去了。我相信它可以在几个地方进行优化，但现在它可以很好地工作。

public function stripWhitespace($html, Array $skipTags = array('pre')){ 
    foreach($skipTags as &$tag){ 
     $tag = "<{$tag}.*?/{$tag}>"; 
    } 
    $skipped = array(); 
    $buffer = preg_replace_callback('#(?<tag>' . implode('|', $skipTags) . ')#si', 
     function($match) use(&$skipped){ 
      $skipped[] = $match['tag']; 
      return "\x1D" . (count($skipped) - 1) . "\x1D"; 
     }, $html 
    ); 
    $buffer = preg_replace('#\s+#si', ' ', $buffer); 
    $buffer = preg_replace('#(?:(?<=>)\s|\s(?=<))#si', '', $buffer); 
    for($i = count($skipped) - 1; $i >= 0; $i--){ 
     $buffer = str_replace("\x1D{$i}\x1D", $skipped[$i], $buffer); 
    } 
    return $buffer; 
}

来源

2011-05-12 Dan

你在html上使用正则表达式吗？为什么？ – 2011-05-12 20:51:51

实际上，你需要的更加复杂：正则表达式还需要确保在

和空格之间没有

，反之亦然。 – abesto 2011-05-12 20:54:30

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – mellamokb 2011-05-12 21:02:22

我你使用的是脚本语言，我会使用多步骤的方法。

拔出NOSTRIP部分，并保存到一个数组中，并用标记替换（###或东西）
更换所有的空间
重新注入所有保存NOSTRIP网页摘要

来源

2011-05-12 21:41:40 Matt

谢谢**马特**;这就是我朝向的方向，我只是好奇如何在没有多个步骤的情况下实现这一目标。另外，是** PHP **。我希望能够按照某种方式“打断”正则表达式解析，当它碰到一个'nostrip'标记时，然后在打到另一个时打开它。 – Dan 2011-05-12 21:46:25

另外，将作为临时分隔符使用的安全字符/字符是什么？（*阅读;你/你认识的其他人/标准惯例使用了什么？*）我在想也许是一个晦涩的控制角色，比如'BEL' – Dan 2011-05-12 21:54:18

我总是发现自己在一次性情况下使用正则表达式，因此更容易理解该文件的唯一字符串。像“~~~”通常起作用。但是，正如你所建议的那样，没有一个万无一失的字符串。你只能用更复杂的字符串来降低风险。试一下：##〜!!〜！##（（__＃ – Matt 2011-05-12 22:52:23

我曾经创造了一组功能，以减少HTML输出空白：

function minify($html) { 
     if(empty($html)) { 
       return $html; 
     } 
     $html = preg_replace('/^(.*)((<pre.*<\/pre>)(.*?))?$/Ues', "parse('$1').'$3'.minify('$4')", $html); 
     return $html; 
} 

function parse($html) { 
     var_dump('1'.$html); 
     // Replace multiple spaces with a single space 
     $html = preg_replace('/(\s+)/m', ' ', $html); 
     // Remove spaces that are followed by either > or < 
     $html = preg_replace('/ ([<>])/', '$1', $html); 
     $html = str_replace('> ', '>', $html); 
     return $html; 
} 

$html = minify($html);

你可能有稍微修改以适应您的需求。

来源

2011-05-12 21:46:16 Arjan

谢谢** Arjan **;我会给它不久之后，他尝试了一些东西。 – Dan 2011-05-12 21:52:56

正则表达式匹配空白，但跳过部分

回答

相关问题