2015-08-03 38 views
3

可以说我有这样一个结构的文件:Java的顺序解析从文件信息

线0:

354858Some String That Is ImportantAA其他的东西SOMESTUFF 应BE IGNORED

第1行:

543788Another String That Is ImportantAA其他的东西 SOMESTUFF需要忽略

等等...

现在我想获得那就是信息在我的示例中标记(请参阅灰色背景)。序列AA始终存在(并可用作中断并跳到下一行),而信息字符串的长度不同。

什么是解析信息的最佳方式?与if, then, else或缓冲的读者是有某种解析器,你可以告诉的,读一些lenth XYZ然后阅读一切为String的,直到你找到AA然后跳过线

+3

你想要什么叫[正则表达式](https://en.wikipedia.org/wiki/Regular_expression)。 – m0skit0

+0

这就是我一直在寻找的,谢谢! – Flatron

+0

确定“AA”不会出现在“某些重要的字符串”中吗? –

回答

1

我会逐行阅读文件,并将每行与正则表达式进行匹配。我希望我在下面的代码中的评论足够详细。

// The pattern to use 
Pattern p = Pattern.compile("^([0-9]+)\\s+(([^A]|A[^A])+)AA"); 

// Read file line by line 
BufferedReader br = new BufferedReader(new FileReader(myFile)); 
String line; 
while((line = br.readLine()) != null) { 
    // Match line against our pattern 
    Matcher m = p.matcher(line); 
    if(m.find()) { 
    // Line is valid, process it however you want 
    // m.group(1) contains the number 
    // m.group(2) contains the text between number and AA 
    } else { 
    // Line has invalid format (pattern does not match) 
    } 
} 

正则表达式(pattern)的说明我用:

^([0-9]+)\s+(([^A]|A[^A])+)AA 

^    matches the start of the line 
([0-9]+)  matches any integral number 
\s+    matches one or more whitespace characters 
(([^A]|A[^A])+) matches any characters which are either not A or not followed by another A 
AA    matches the terminating AA 

更新作为回复评论:

如果每行有一个前|性格,表达外观像这样:

^\|([0-9]+)\s+(([^A]|A[^A])+)AA 

在Java中,你需要逃避这样的:

"^\\|([0-9]+)\\s+(([^A]|A[^A])+)AA" 

字符|在正则表达式特殊含义,来转义。

+0

谢谢你这个例子,我现在需要查看正则表达式。 – Flatron

+1

@Flatron不客气,我更新了我的答案并添加了对该表达的解释。 –

+0

我有一个问题,我真的不想复制和粘贴解决方案,但对于学习和测试它有帮助。当我复制你的代码时,我得到一个错误'“无效的转义序列(有效的转义序列是\ b \ t \ n \ f \ r \”\'\\)“''模式'”^([0-9] + )\ s +(([^ A] | A [^ A])+)AA“'我错过了什么吗?我importet'java.util.regex.Pattern;'但这没有帮助。在AA背后有什么遗漏? – Flatron

1

要告诉你哪个是最适合你的问题是不可能的,没有更多的信息。

一个解决方案可能

String s = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED"; 
String[] split = s.substring(0, s.indexOf(" AA")).split(" ", 2); 
System.out.println("split = " + Arrays.toString(split)); 

输出

split = [354858, Some String That Is Important] 
0

这里是您的解决方案:

public static void main(String[] args) { 
    InputStream source; //select a text source (should be a FileInputStream) 
    { 
     String fileContent = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED\n" + 
       "543788 Another String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED"; 
     source = new ByteArrayInputStream(fileContent.getBytes(StandardCharsets.UTF_8)); 
    } 

    try(BufferedReader stream = new BufferedReader(new InputStreamReader(source))) { 
     Pattern pattern = Pattern.compile("^([0-9]+) (.*?) AA .*$"); 
     while(true) { 
      String line = stream.readLine(); 
      if(line == null) { 
       break; 
      } 
      Matcher matcher = pattern.matcher(line); 
      if(matcher.matches()) { 
       String someNumber = matcher.group(1); 
       String someText = matcher.group(2); 
       //do something with someNumber and someText 
      } else { 
       throw new ParseException(line, 0); 
      } 
     } 
    } catch (IOException | ParseException e) { 
     e.printStackTrace(); // TODO ... 
    } 
} 
0

你可以使用正则表达式,但如果你知道每一行包含AA和你想要的内容,以AA你可以简单地做substring(int,int),以获得该行的部分达到AA

public List read(Path path) throws IOException { 
    return Files.lines(path) 
      .map(this::parseLine) 
      .collect(Collectors.toList()); 
} 

public String parseLine(String line){ 
    int index = line.indexOf("AA"); 
    return line.substring(0,index); 
} 

这里是read

public List read(Path path) throws IOException { 
    List<String> content = new ArrayList<>(); 

    try(BufferedReader reader = new BufferedReader(new FileReader(path.toFile()))){ 
     String line; 
     while((line = reader.readLine()) != null){ 
      content.add(parseLine(line)); 
     } 
    } 

    return content; 
} 
1

非Java8版本,您可以逐行读取文件中的行,并排除其中包含AAcharSequence部分:

final String charSequence = "AA"; 
String line; 
BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream("yourfilename"))); 
try { 
    while ((line = r.readLine()) != null) { 
     int pos = line.indexOf(charSequence); 
     if (pos > 0) { 
      String myImportantStuff = line.substring(0, pos); 
      //do something with your useful string 
     } 
    } 
} finally { 
    r.close(); 
}