2012-02-03 49 views
4

我试过把头绕过这段很长一段时间,但还是没有找到解决方案。在长文本中找到与php /正则表达式匹配的括号

我正在处理一些简单的格式化方法,其中我需要一些包含括号内字符串的标记,并在括号前面定义标记。标签也应该能够在其他支架内。

字符串:

This is some random text, tag1{while this is inside a tag2{tag}}. This is some 
other text tag2{also with a tag tag3{inside} of it}. 

我想现在要做的,是我发现其他有类似问题(Find matching brackets using regular expression),但他们的问题是更注重对每个

tag1{} 
tag2{} 
tag3{} 

内容如何在其他括号内找到匹配的括号,而我的问题就是这样,并且在较长的文本中找到多个括号。

回答

1

的正则表达式是这样的:

tag[0-9]+\{[^\}]+ 

此时应更换第一内侧标签

2

我不知道,如果有一个正则表达式,得到您所有的内,外标签一个电话,但你可以使用这个正则表达式/\{(([^\{\}]+)|(?R))*\}/从你链接的问题,递归迭代到结果。

我加了你的变量名和一些指定的子模式的正则表达式更加清晰:

function search_tags($string, $recursion = 0) { 
    $Results = array(); 
    if (preg_match_all("/(?<tagname>[\w]+)\{(?<content>(([^\{\}]+)|(?R))*)\}/", $string, $matches, PREG_SET_ORDER)) { 
     foreach ($matches as $match) { 
      $Results[] = array('match' => $match[0], 'tagname' => $match['tagname'], 'content' => $match['content'], 'deepness' => $recursion); 
      if ($InnerResults = search_tags($match['content'], $recursion+1)) { 
       $Results = array_merge($Results, $InnerResults); 
      } 
     } 
     return $Results; 
    } 
    return false; 
} 

此方法返回包含整个匹配,标签名,括号的内容和所有匹配的数组一个迭代计数器,显示您匹配的频率嵌套在其他标签内的频率。我已经加入到另一个嵌套级别的字符串示范:

$text = "This is some random text, tag1{while this is inside a tag2{tag}}. This is some other text tag3{also with a tag tag4{and another nested tag5{inside}} of it}."; 
echo '<pre>'.print_r(search_tags($text), true).'</pre>'; 

输出将是:

Array 
(
    [0] => Array 
     (
      [match] => tag1{while this is inside a tag2{tag}} 
      [tagname] => tag1 
      [content] => while this is inside a tag2{tag} 
      [deepness] => 0 
     ) 

    [1] => Array 
     (
      [match] => tag2{tag} 
      [tagname] => tag2 
      [content] => tag 
      [deepness] => 1 
     ) 

    [2] => Array 
     (
      [match] => tag3{also with a tag tag4{and another nested tag5{inside}} of it} 
      [tagname] => tag3 
      [content] => also with a tag tag4{and another nested tag5{inside}} of it 
      [deepness] => 0 
     ) 

    [3] => Array 
     (
      [match] => tag4{and another nested tag5{inside}} 
      [tagname] => tag4 
      [content] => and another nested tag5{inside} 
      [deepness] => 1 
     ) 

    [4] => Array 
     (
      [match] => tag5{inside} 
      [tagname] => tag5 
      [content] => inside 
      [deepness] => 2 
     ) 

) 
3

如果标签总是平衡的,你可以使用像这样的表达式来获取内容和所有标签的名称,包括嵌套标签。

\b(\w+)(?={((?:[^{}]+|{(?2)})*)}) 

Example

$str = "This is some random text, tag1{while this is inside a tag2{tag}}. This is some other text tag2{also with a tag tag3{inside} of it}."; 

$re = "/\\b(\\w+)(?={((?:[^{}]+|{(?2)})*)})/"; 
preg_match_all($re, $str, $m); 

echo "* Tag names:\n"; 
print_r($m[1]); 
echo "* Tag content:\n"; 
print_r($m[2]); 

输出:

* Tag names: 
Array 
(
    [0] => tag1 
    [1] => tag2 
    [2] => tag2 
    [3] => tag3 
) 
* Tag content: 
Array 
(
    [0] => while this is inside a tag2{tag} 
    [1] => tag 
    [2] => also with a tag tag3{inside} of it 
    [3] => inside 
) 
+2

+1递归子模式。 – cmbuckley 2012-02-03 13:16:15

+0

@cbuckley,这两个原始表达式的工作。第一个原始的更短,更好,但在捕获中包含周围的“{}”。 – Qtax 2012-02-05 03:28:42

+0

同意;我只是使正则表达式和示例一致。随意恢复我的更改,或编辑提及这两个正则表达式。 – cmbuckley 2012-02-06 00:20:23

0

我觉得没有别的办法。你需要遍历每个括号。

 $output=array(); 
    $pos=0;  
while(preg_match('/tag\d+\{/S',$input,$match,PREG_OFFSET_CAPTURE,$pos)){ 
    $start=$match[0][1]; 
    $pos=$offset=$start+strlen($match[0][0]); 
    $bracket=1; 
    while($bracket!==0 and preg_match('/\{|\}/S',$input,$found,PREG_OFFSET_CAPTURE,$offset)){ 
     ($found[0][0]==='}')?$bracket--:$bracket++; 
     $offset=$found[0][1]+1; 
    } 
    $output[]=substr($input,$start,$offset-$start); 
}