2017-02-21 164 views
2

我想从起始标题到结束标题拿起整个块,但不包括结束标题。实例是:如何使用正则表达式匹配从开始到结束的块

<section1> 
Base_Currency=EUR 
Description=Revaluation 
Grouping_File 
<section2> 

比赛结果应该是:

<section1> 
Base_Currency=EUR 
Description=Revaluation 
Grouping_File 

问题是,我怎么能制定本场比赛在Java中使用正则表达式的模式?

+0

regexr.com帮助我在这些情况下很多。它也给你一个作弊表。 – Midnightas

+0

你能举个例子吗? – QSY

回答

1

如果你输入类似下面

<section1> 
Base_Currency=EUR 
Description=Revaluation 
Grouping_File 
<section2> 
Base_Currency=EUR 
Description=Revaluation 
Grouping_File 
<section3> 
Base_Currency=EUR 
Description=Revaluation 
Grouping_File 

然后你就可以使用以下正则表达式

(?s)(<section\d+>.*?)(?=<section\d+>|$) 

解释为正则表达式是

NODE      EXPLANATION 
-------------------------------------------------------------------------------- 
    (?s)      set flags for this block (with . matching 
          \n) (case-sensitive) (with^and $ 
          matching normally) (matching whitespace 
          and # normally) 
-------------------------------------------------------------------------------- 
    (      group and capture to \1: 
-------------------------------------------------------------------------------- 
    <section     '<section' 
-------------------------------------------------------------------------------- 
    \d+      digits (0-9) (1 or more times (matching 
          the most amount possible)) 
-------------------------------------------------------------------------------- 
    >      '>' 
-------------------------------------------------------------------------------- 
    .*?      any character (0 or more times (matching 
          the least amount possible)) 
-------------------------------------------------------------------------------- 
)      end of \1 
-------------------------------------------------------------------------------- 
    (?=      look ahead to see if there is: 
-------------------------------------------------------------------------------- 
    <section     '<section' 
-------------------------------------------------------------------------------- 
    \d+      digits (0-9) (1 or more times (matching 
          the most amount possible)) 
-------------------------------------------------------------------------------- 
    >      '>' 
-------------------------------------------------------------------------------- 
    |      OR 
-------------------------------------------------------------------------------- 
    $      before an optional \n, and the end of 
          the string 
-------------------------------------------------------------------------------- 
)      end of look-ahead 

如果你想匹配只为一个标签,那么你可以使用

(?s)(<section\d+>[^<]*) 

解释这个表达式是

NODE      EXPLANATION 
-------------------------------------------------------------------------------- 
    (?s)      set flags for this block (with . matching 
          \n) (case-sensitive) (with^and $ 
          matching normally) (matching whitespace 
          and # normally) 
-------------------------------------------------------------------------------- 
    (      group and capture to \1: 
-------------------------------------------------------------------------------- 
    <section     '<section' 
-------------------------------------------------------------------------------- 
    \d+      digits (0-9) (1 or more times (matching 
          the most amount possible)) 
-------------------------------------------------------------------------------- 
    >      '>' 
-------------------------------------------------------------------------------- 
    [^<]*     any character except: '<' (0 or more 
          times (matching the most amount 
          possible)) 
-------------------------------------------------------------------------------- 
)      end of \1 
2

如果您整个输入此格式的,你可以简单地拆分:

String[] sections = input.split("\\R(?=<)"); 

\R是“任何新行序列”和(?=<)手段“的一个字符是'<'”。

但是如果不是这种情况,从你会需要的正则表达式工具箱:

  • DOTALL标志,以点匹配换行符太
  • MULTILINE标志,以便^比赛开始行太
  • 负面展望让你在下一节开始时停止消费

假设“节”开始w第i个一“<”在一行的开头:

"(?sm)^<\\w+>(.(?!^<))*" 

这里是你如何使用它:

String input = "<section1>\nBase_Currency=EUR\nDescription=Revaluation\nGrouping_File\n<section2>\nfoo"; 
Matcher matcher = Pattern.compile("(?sm)^<\\w+>(.(?!^<))*").matcher(input); 
while (matcher.find()) { 
    String section = matcher.group(); 
}