正则表达式在一个组中捕获

我希望能够删除<p>标记中的所有新行的实例，但不是外部的。例如：正则表达式在一个组中捕获

<p dir="ltr">Test<br>\nA\naa</p>\n<p dir="ltr">Bbb</p>

这是我想出了正则表达式：

(<p[^>]*?>)(?:(.*)\n*)*(.*)(</p[^>]*?>)

，我替换为：

$1$2$3$4

我希望这会工作，但(?:(.*)\n*)*似乎导致的问题。有没有什么办法像这样做重复的比赛，还有一个抓球队？

在此先感谢！

来源

2016-05-23 Jun

有两个'p '标签？你希望'\ n'分开移除它们吗？ – rock321987

单独为'p'标记是好的。它只是我希望一举取代'p'标签中的所有'\ n'。我希望它可能与正则表达式没有嵌套循环。 – Jun

解析器不会更适合您的需求吗？ – Jan

解决方案

你可以使用这个表达式（在PCRE但不是在Java中。作品对于Java版本请参考下面）

(?s)(?:<p|\G(?!\A))(?:(?!<\/p>).)*?\K[\n\r]+

Regex Demo

Rege X击穿

(?s) #Enable . to match newlines 

(?: 
    <p #this part is to assure that whatever we find is inside <p tag 
    | #Alternation(OR) 
    \G(?!\A) #Find the position of starting of previous match. 
) 

(?: 
    (?!<\/p>). #Till it is impossible to match </p>, match . 
)*? #Do it lazily 

\K #Whatever is matched till now discard it 

[\n\r]+ #Find \n or \r

的Java代码

有了一点改变，我能做到这一点在Java中

String line = "<p dir=\"ltr\">Test<br>\nA\naa</p>\nabcd\n<p dir=\"ltr\">Bbb</p>"; 
System.out.println(line.replaceAll("(?s)((?:<p|\\G(?!\\A))(?:(?!<\\/p>).)*?)[\\n\\r]+", "$1"));

Ideone Demo

来源

2016-05-23 18:28:07 rock321987

圣...哇。这真是太神奇了。 – Jun

该死的我的正则表达式noobness！做得好摇滚 - 我太慢了，不能成为救世主。 – zec

@Jun首先让我在JAVA – rock321987

正则表达式在一个组中捕获

回答

相关问题