2017-04-03 41 views
0

我要解析这个包:正则表达式的联接线的相关信息

WGS AUFFUELLUNGEN 
ADMIN1   23.03. 
17:09 -20- 1500.00 
17:10 JD20 560.00 
17:11 -2.0- 112.00 
ADMIN1   24.03. 
14:51 JD50 500.00 
ADMIN2   27.03. 
08:58 JD50 500.00 
---------------------- 
       3172.00 

解析用户和日期很简单:

\r?\n(.*)\s+(\d\d\.\d\d\.) 

解析时间,面额和数量也很简单:

\r?\n(\d\d:\d\d)\s+(.*)\s+(\d+\.\d\d) 

但我需要一个解析,一次检测用户,日期,时间,面额和金额为每个预订。

任何想法?

回答

0

您将需要某种形式的中间结构,你可以遍历。如果你不能改变你的java代码,也许你可以使用正则表达式首先匹配你的示例字符串的整个块。在第二步中,您匹配所有的细节。

public class RegexTestCase { 

private static final String PACKAGE 
    = "WGS AUFFUELLUNGEN  \n" + 
    "ADMIN1   23.03.\n" + 
    "17:09 -20- 1500.00\n" + 
    "17:10 JD20 560.00\n" + 
    "17:11 -2.0- 112.00\n" + 
    "ADMIN1   24.03.\n" + 
    "14:51 JD50 500.00\n" + 
    "ADMIN2   27.03.\n" + 
    "08:58 JD50 500.00\n" + 
    "----------------------\n" + 
    "    3172.00\n"; 

private static final String NL = "\\r?\\n"; 

private static final String USER_DATE_REGEX 
= "(.*?)\\s+(\\d\\d\\.\\d\\d\\.)"; 

private static final String TIME_AMOUNT_REGEX 
= "(\\d\\d:\\d\\d)\\s+(.*?)\\s+(\\d+\\.\\d\\d)"; 

private static final String BLOCK_REGEX 
    = USER_DATE_REGEX + NL + "((" + TIME_AMOUNT_REGEX + NL + ")+)"; 


@Test 
public void testRegex() throws Exception { 
    Pattern blockPattern = Pattern.compile(BLOCK_REGEX); 
    Pattern timeAmountPattern = Pattern.compile(TIME_AMOUNT_REGEX); 

    int count = 0; 
    Matcher blockMatcher = blockPattern.matcher(PACKAGE); 
    while (blockMatcher.find()) { 
     String name = blockMatcher.group(1); 
     String date = blockMatcher.group(2); 
     String block = blockMatcher.group(3); 

     Matcher timeAmountMatcher = timeAmountPattern.matcher(block); 
     while (timeAmountMatcher.find()) { 
      String time = timeAmountMatcher.group(1); 
      String denom = timeAmountMatcher.group(2); 
      String amount = timeAmountMatcher.group(3); 

      assertEquals("wrong name", RESULTS[count].name, name); 
      assertEquals("wrong date", RESULTS[count].date, date); 
      assertEquals("wrong time", RESULTS[count].time, time); 
      assertEquals("wrong denom", RESULTS[count].denom, denom); 
      assertEquals("wrong amount", RESULTS[count].amount, amount); 
      count++; 
     } 
    } 
    assertEquals("wrong number of results", 5, count); 
} 

private static final Result[] RESULTS 
= { new Result("ADMIN1", "23.03.", "17:09", "-20-", "1500.00") 
    , new Result("ADMIN1", "23.03.", "17:10", "JD20", "560.00") 
    , new Result("ADMIN1", "23.03.", "17:11", "-2.0-", "112.00") 
    , new Result("ADMIN1", "24.03.", "14:51", "JD50", "500.00") 
    , new Result("ADMIN2", "27.03.", "08:58", "JD50", "500.00") 
    }; 

static final class Result { 
    private final String name; 
    private final String date; 
    private final String time; 
    private final String denom; 
    private final String amount; 
    Result(String name, String date, String time, String denom, String amount) { 
     this.name = name; 
     this.date = date; 
     this.time = time; 
     this.denom = denom; 
     this.amount = amount; 
    } 
} 
} 
+0

是的,情况就是这样。整个区块(从标题到总和)已经被解析出大约50K的文本。现在解析细节是一项挑战 - 将每个预订的用户,日期,时间,面额,金额与一个表达式结合在一起。 – quero59

0

你的第二个正则表达式太渴望了,看看this

我建议把它变成\r?\n(\d\d:\d\d)\s+(.*?)\s+(\d+.\d\d)

This regex会立即匹配用户,日期,时间,为每一位预约的名称和金额,但我已经添加了多行的正则表达式标志:

(^(.*)\s+(\d\d\.\d\d\.)$|^(\d\d:\d\d)\s+(.*)\s+(\d+\.\d\d)$)+ 
+0

THX freedev,你的表达并不在我们的Java工具或在线工具,如https://regex101.com/ 工作,目前,我试图了解更多关于你提到的多选项... – quero59

+0

在我的文章中,我刚刚在https://regex101.com/r/yVTa5y/3 – freedev

+0

上添加了一个工作示例。很抱歉,我错过了设置正则表达式的选项。不过,我需要输出格式: 组1总会用户 组2总会日期 组3总会时间 等 – quero59

0
  1. 分割整个字符串由新线
  2. 遍历每一行和

    a. look for username and date by regex1, if matches then extract userName and Date 
        b. if regex1 doesn't, then look for time, denomincation and amount regex2 . if it matches 
        then extract time, denomination and amount from this. 
    
    
    final String userRegex = "^(\\w+)\\s+(\\d+\\.\\d+\\.)$"; 
    final String timeRegex = "^(\\d+:\\d+)\\s+([\\S]+)\\s+(\\d+\\.?\\d+)$"; 
    

样品来源:

public static void main(String[] args) { 
    final String userRegex = "^(\\w+)\\s+(\\d+\\.\\d+\\.)$"; 
    final String timeRegex = "^(\\d+:\\d+)\\s+([\\S]+)\\s+(\\d+\\.?\\d+)$"; 

    final String string = "WGS AUFFUELLUNGEN\n" 
      + "ADMIN1   23.03.\n" 
      + "17:09 -20- 1500.00\n" 
      + "17:10 JD20 560.00\n" 
      + "17:11 -2.0- 112.00\n" 
      + "ADMIN1   24.03.\n" 
      + "14:51 JD50 500.00\n" 
      + "ADMIN2   27.03.\n" 
      + "08:58 JD50 500.00\n" 
      + "----------------------\n" 
      + "    3172.00\n"; 


    String[] list = string.split("\n"); 
    Matcher m; 
    int cnt=1; 
    for (String s : list) { 
     m=Pattern.compile(userRegex).matcher(s); 
     if (m.matches()) { 

      System.out.println("##### List "+cnt+" ######"); 
      System.out.println("User Name:"+m.group(1)); 
      System.out.println("Date :"+m.group(2)); 
      cnt++; 
     } 
     else 
     { 
      m=Pattern.compile(timeRegex).matcher(s); 
      if(m.matches()) 
      { 
       System.out.println("Time :"+m.group(1)); 
       System.out.println("Denomination :"+m.group(2)); 
       System.out.println("Amount :"+m.group(3)); 
       System.out.println("---------------------"); 
      } 
     } 
    } 
} 
+0

Thx Rizwan。 Unfornately我无法编码任何东西。我需要一个解决所有预订的表达方式。我必须用这个表达式来提供一个java工具,它有一个修复代码。 – quero59

+0

这就足够了这样的格式的任何数据,因此不固定。此外,你不能通过一个单一的正则表达式在java中按照你的要求去获取每个单独的数据! –