Preg_match_all与嵌套匹配

我正在开发模板系统并遇到一些问题。Preg_match_all与嵌套匹配

该计划是在其中创建带有[@tags]的HTML文档。我可以只使用str_replace函数（我可以循环槽全部更换更多钞票），但我想按这个远一点;-)

我想允许嵌套的标签，并允许参数与每个标签：

[@title|You are looking at article [@articlenumber] [@articlename]]

我想获得与preg_match_all结果如下：

[0] title|You are looking at article [@articlenumber] [@articlename] 
[1] articlenumber 
[2] articlename

我的脚本将拆分|参数。从我的脚本的输出将是这样的：

<div class='myTitle'>You are looking at article 001 MyProduct</div>

我遇到的问题是，我不是跟正则表达式exprerienced。我的paterns结果几乎是我想要的，但有嵌套params问题。

\[@(.*?)\]

将从articlenumber停在。

\[@(.*?)(((?R)|.)*?)\]

是更喜欢它，但它没有抓住articlenumber; https://regex101.com/r/UvH7zi/1

希望有人能帮助我！提前致谢！

来源

2017-10-09 Remi Romme

我相信是时候使用一个合适的html解析器，比如http://simplehtmldom.sourceforge.net/;）下面是关于pcre递归模式的总结，但是它会很快失去作用http：// www.rexegg.com/regex-recursion.html。 –

你不能用普通的Python正则表达式来做到这一点。您正在寻找类似于“balancing groups”的功能。 NET RegEx's engine，允许嵌套匹配。

在PyParsing允许嵌套表达请看：从pyparsing进口nestedExpr

import pyparsing as pp 
text = '{They {mean to {win}} Wimbledon}' 
print(pp.nestedExpr(opener='{', closer='}').parseString(text))

输出是：

[['They', ['mean', 'to', ['win']], 'Wimbledon']]

不幸的是，这不符合你的例子很好地工作。我想，你需要更好的语法。

您可以尝试QuotedString定义，但仍然可以。

import pyparsing as pp 
single_value = pp.QuotedString(quoteChar="'", endQuoteChar="'") 
parser = pp.nestedExpr(opener="[", closer="]", 
         content=single_value, 
         ignoreExpr=None) 

example = "['@title|You are looking at article' ['@articlenumber'] ['@articlename']]" 
print(parser.parseString(example, parseAll=True))

来源

2017-10-09 08:12:04 wp78de

使用您的原始模式，我能找到的最接近您想要的输出是： '\ [@（。*？）（\ b（（？R）|。* *）* \]' – wp78de

wp78de：this是最接近我的发言。问题是在标题内嵌入另一个标签时，找不到它，因为参数数量不是动态的。但你的awnser非常接近我所需要的 –

而我很抱歉没有提到我的程序语言，我使用PHP。现在我已经装箱解析器： ' - 让所有打开的标签，并把它们strpos阵 - 环槽都开始开放标签的位置 - 寻找下一个closingtag，是之前的下一个开放-标签？比标签完整 - 如果closingtag在开始标签之后，跳过那一个并寻找下一个（并继续检查其间的开始标签）' 这样我就可以找到所有完整的标签并替换它们。但是，这花了大约50行代码和多个循环，所以一个preg_match会更大;-) –

这里是我的代码：

@\w+\|[\w\s]+\[@(\w+)]\s+\[@(\w+)]

https://regex101.com/r/UvH7zi/3

来源

2017-10-09 09:04:48 minhung

现在我已经装箱解析器：

- get all opening tags, and put their strpos in array - loop trough all start positions of the opening tags - Look for the next closingtag, is it before the next open-tag? than the tag is complete - If the closingtag was after an opening tag, skip that one and look for the next (and keep checking for openingtags in between)

这样，我能找到的所有完整的标签并替换它们。但是，这花了大约50行代码和多个循环，所以一个preg_match会更大;-)

来源

2017-10-09 12:46:27

我在我的手机上输入这个，所以可能会有一些错误，但是你想要的东西可以很容易地实现通过将先行进入你的表达：

(?=\\[(@(?:\\[(?1)\\]|.)*)\\])

编辑：是的，它的工作原理，在这里你去：https://regex101.com/r/UvH7zi/4

由于（？=）不消耗字符，图案看起来并捕获所有内容“[@ *]”主题中的子串，递归地检查内容本身是否包含平衡组，如果有的话。

来源

2017-10-10 18:53:50 jaytea

Preg_match_all与嵌套匹配

回答

相关问题