2012-04-08 93 views
0

我有一些文字被包裹在[quote][/quote]中,我试图匹配这些标签之前的所有文本,这些标签之间的所有内容以及这些标签之后的所有内容。问题在于它们可能有多次出现,但不在彼此之内。preg_match_all越来越奇怪

我这样做的原因是因为我想对这些标记之外的所有文本运行过滤器,无论是否存在多个事件。

这就是我开始一起工作:

preg_match_all("/(^.*)\[quote\](.*?)\[\/quote\](.*)/si", $reply['msg'], $getthequotes); 

下面是输出:

Array 
(
[0] => Array 
    (
     [0] => putting some stuff before the quote 
[quote][b]Logan said[/b][br]testing this youtube link http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i][/quote] 

yep 

http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA 

adding a quote 

[quote][b]Logan said[/b][br]This is the start of the second quote http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i][/quote] 

[i]04/07/12 20:18:07: Edited by Logan(2)[/i] 
    ) 

[1] => Array 
    (
     [0] => putting some stuff before the quote 

[quote][b]Logan said[/b][br]testing this youtube link http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i][/quote] 

yep 

http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA 

adding a quote 


    ) 

[2] => Array 
    (
     [0] => [b]Logan said[/b][br]This is the start of the second quote http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i] 
    ) 

[3] => Array 
    (
     [0] => 

[i]04/07/12 20:18:07: Edited by Logan(2)[/i] 
    ) 

) 

正如你可以看到它没有得到所需的输出。任何帮助,将不胜感激。

+0

啊......不是HTML的标记语言 - 肯定的正则表达式最终将正确的工具? – 2012-04-08 00:37:39

+0

我有自定义的bbcode像标签被分析成HTML。所有的正则表达式解析都是在PHP中完成的。 – 2012-04-08 00:42:53

+1

对不起,我有点讽刺,根据这个[非常流行的谬论](http://stackoverflow.com/a/1732454/596781)。答案是,*不要*使用正则表达式,因为它们不是正确的工具。 – 2012-04-08 00:45:32

回答

1

我还没有试过这个,但你只想要[quote]之前和[/quote]之后的东西,你可以为首次出现的开始引号标签做一个strpos。现在你知道以前没有引用的所有内容。

接下来,您可以使用从第一个匹配的引号标签的索引开始的strpos来查找结束引号标签。你可以放弃这些东西。

现在使用您刚刚找到的结束报价标签的起始位置为下一个报价块做另一个结果。你可以重复这个,直到你到最后。

+0

另外,如果你想要嵌套,首先搜索第一个'[/ quote]',然后从那里搜索* [back] *以打开'[quote]' - 这会给你最内层的报价。根据需要进行格式化,然后冲洗并重复。 – mpen 2012-04-08 02:43:23

+0

我需要所有的。我只需要对非引号文本进行额外的处理。我想我可以做到这一点,虽然保存每个在它自己的var然后连接所有的部分重新组合在一起...有点屁股倒退,但我想它会工作。 – 2012-04-08 03:06:03

+0

是的,如果将部件连接在一起,它应该可以工作。对于那个很抱歉。是的,它是一种天真的算法,但它不应该为你的目的太慢。实际上,我认为我从Udacity 101类中得到了这个想法,他们使用类似的方法在html页面中分析链接。 – Gohn67 2012-04-08 03:26:22

0

它可以完成,但您需要在字符串上进行多次传递。

$string = 'putting some stuff before the quote 
[quote][b]Logan said[/b][br]testing this youtube link http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i][/quote] 

yep 

http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA 

adding a quote 

[quote][b]Logan said[/b][br]This is the start of the second quote http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i][/quote] 

[i]04/07/12 20:18:07: Edited by Logan(2)[/i]putting some stuff before the quote 

[quote][b]Logan said[/b][br]testing this youtube link http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA[br][br]did it work?[br][br][i]04/04/12 23:48:46: Edited by Logan(2)[/i][br][br][i]04/04/12 23:55:44: Edited by Logan(2)[/i][/quote] 

yep 

http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA 

adding a quote'; 

//get rid of whitespace 
$string = preg_replace('%\s\s?%', " ",$string); 
//break the string on a common element 
$pieces = preg_split('%\[%',$string); 
//now discard the elements that are tags 
foreach($pieces as $key=>$value): 
    $value = trim($value); 
    if(strrpos($value,"]") == (strlen($value) -1)): 
     unset($pieces[$key]); 
    endif; 
endforeach; 
print_r($pieces); 
//and finally strip out the tag fragments 
foreach($pieces as $key=>$value): 
    $pieces[$key] = preg_replace('%.*]%',"",$value); 
endforeach; 

结果是一个数组,看起来像这样:

Array 
(
    [0] => putting some stuff before the quote 
    [2] => Logan said 
    [4] => testing this youtube link http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA 
    [6] => did it work? 
    [9] => 04/04/12 23:48:46: Edited by Logan(2) 
    [13] => 04/04/12 23:55:44: Edited by Logan(2) 
    [15] => yep http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA adding a quote 
    [17] => Logan said 
    [19] => This is the start of the second quote http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA 
    [21] => did it work? 
    [24] => 04/04/12 23:48:46: Edited by Logan(2) 
    [28] => 04/04/12 23:55:44: Edited by Logan(2) 
    [31] => 04/07/12 20:18:07: Edited by Logan(2) 
    [32] => putting some stuff before the quote 
    [34] => Logan said 
    [36] => testing this youtube link http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA 
    [38] => did it work? 
    [41] => 04/04/12 23:48:46: Edited by Logan(2) 
    [45] => 04/04/12 23:55:44: Edited by Logan(2) 
    [47] => yep http://www.youtube.com/watch?v=8UVNT4wvIGY&feature=g-music&context=G2db8219YMAAAAAAAAAA adding a quote 
)