RegexKitLite：匹配表达式 - >匹配除了] - > Match]

我实质上是试图替换大文本中的所有脚注。在Objective-C中我有很多种原因，所以请假设这个约束。RegexKitLite：匹配表达式 - >匹配除了] - > Match]

每个脚注众生本：[脚注

每个脚注只能到此为止：]

可以有这两种标记物，包括换行符之间不惜一切代价。但是，他们之间永远不会有]。

所以，基本上我想匹配[脚注，然后匹配任何东西除外]，直到]匹配。

这是最接近我已经能够去确定所有脚注：使用正则表达式设法找出八百八十九分之七百八十零脚注

NSString *regexString = @"[\\[][F][o][o][t][n][o][t][e][^\\]\n]*[\\]]";

。它似乎也没有一个是虚假警报。似乎错过的只有那些有断行符的脚注。

我在www.regular-expressions.info上花了很长时间，特别是在关于点的页面上（http://www.regular-expressions.info/dot.html）。这有助于我创建上面的正则表达式，但我还没有真正想出如何包含任何字符或换行符，除了右括号。

使用下面的正则表达式，而不是设法捕捉所有脚注的，但它抓住了太多的文字，因为*是贪婪：(?s)[\\[][F][o][o][t][n][o][t][e].*[\\]]

下面是一些示例文本的正则表达式上运行：

<p id="id00082">[Footnote 1: In the history of Florence in the early part of the XVIth century <i>Piero di Braccio Martelli</i> is frequently mentioned as <i>Commissario della Signoria</i>. He was famous for his learning and at his death left four books on Mathematics ready for the press; comp. LITTA, <i>Famiglie celebri Italiane</i>, <i>Famiglia Martelli di Firenze</i>.—In the Official Catalogue of MSS. in the Brit. Mus., New Series Vol. I., where this passage is printed, <i>Barto</i> has been wrongly given for Braccio.</p> 

    <p id="id00083">2. <i>addi 22 di marzo 1508</i>. The Christian era was computed in Florence at that time from the Incarnation (Lady day, March 25th). Hence this should be 1509 by our reckoning.</p> 

    <p id="id00084">3. <i>racolto tratto di molte carte le quali io ho qui copiate</i>. We must suppose that Leonardo means that he has copied out his own MSS. and not those of others. The first thirteen leaves of the MS. in the Brit. Mus. are a fair copy of some notes on physics.]</p> 

    <p id="id00085">Suggestions for the arrangement of MSS treating of particular subjects.(5-8).</p> 

When you put together the science of the motions of water, remember to include under each proposition its application and use, in order that this science may not be useless.-- 

[Footnote 2: A comparatively small portion of Leonardo's notes on water-power was published at Bologna in 1828, under the title: "_Del moto e misura dell'Acqua, di L. da Vinci_".]

在这个例子中有两个脚注和一些非脚注文本。正如你所看到的，第一个脚注包含两个换行符。第二个不包含换行符。

上面提到的第一个正则表达式将在本示例文本中捕获脚注2，但它不会捕获脚注1，因为它包含换行符。

对我的正则表达式的任何改进都将非常感谢。

来源

2010-12-03 Xander Dunn

尝试

@"\\[Footnote[^\\]]*\\]";

这应该跨越换行符匹配。无需将单个字符放入字符类中。

作为评论的，正则表达式多（没有字符串转义）：

\[  # match a literal [ 
Footnote # match literal "Footnote" 
[^\]]* # match zero or more characters except ] 
\]  # match ]

在字符类（[...]），光标^呈现出不同的含义;它否定了课堂的内容。所以[ab]匹配a或b，而[^ab]匹配除a或b以外的任何字符。

当然，如果你有嵌套的脚注，这将会失效。像[Footnote foo [footnote bar] foo]这样的文本将从开始到匹配bar]。为避免这种情况，将正则表达式更改为

@"\\[Footnote[^\\]\\[]*\\]";

因此，不允许打开或关闭括号。那么当然，你只匹配最里面的脚注，并且必须对整个文本应用相同的正则表达式两次（或更多，取决于最大嵌套水平），逐层“剥离”。

来源

2010-12-03 18:59:16

这似乎工作。它匹配883次，但它取代了所有的脚注（889），所以显然有6次它吞没了两个脚注而不是一个脚注。也许有三个嵌套的脚注？我需要一段时间才能找到它们。这为什么有效？我不明白[^ \\]] *是如何工作的。不应该只是寻找以右括号开头的行吗？我认为^角色应该“在一行的开头匹配”。 – 2010-12-03 21:14:19

RegexKitLite：匹配表达式 - >匹配除了] - > Match]

回答

相关问题