正则表达式

可能重复：
Java - regular expression finding comments in code 正则表达式

如何找到与正则表达式的Java代码中的注释。像//和/ * */

来源

2011-03-09 kiran

如果你输入一个实际的标题，系统会为你做一个搜索，所以你不会发布不必要的重复。 – 2011-03-09 14:12:21

参见前面的：Java - regular expression finding comments in code，或者通过相关查询从谷歌某些随机链接：http://ostermiller.org/findcomment.html

来源

2011-03-09 07:57:47 ilalex

试试这个：

public class Test { 

    // comment 1 

    /* 
    comment 2 
    // no line comment 
    */ 

    char c = '"'; // comment 3, " is not the start of a string literal! 

    String s = "/* no comment */ ... /*"; 

    String t = "*/ also // not a comment"; 

    private static String getContentsOf(String fileName) throws FileNotFoundException { 
     Scanner scan = new Scanner(new File(fileName)); 
     StringBuilder b = new StringBuilder(); 
     while(scan.hasNextLine()) { 
      b.append(scan.nextLine()).append("\n"); 
     } 
     return b.toString(); 
    } 

    public static void main(String[] args) throws FileNotFoundException { 
     String anyChar = "[\\s\\S]"; 
     String singleLineComment = "//[^\r\n]*"; 
     String multiLineComment = "/\\*" + anyChar + "*?\\*/"; 
     String stringLiteral = "\"(?:\\\\.|[^\"\r\n\\\\])*\""; 
     String charLiteral = "'(?:\\\\.|[^'\r\n\\\\])+'"; 

     String regex = String.format("(%s)|(%s)|(%s)|(%s)|(%s)", 
       singleLineComment, // group 1 
       multiLineComment, // group 2 
       stringLiteral,  // group 3 
       charLiteral,  // group 4 
       anyChar);   // group 5 

     Matcher m = Pattern.compile(regex).matcher(getContentsOf("Test.java")); 

     while(m.find()) { 
      String matched = m.group(); 
      if(m.group(1) != null || m.group(2) != null) { 
       System.out.println("matched = " + matched); 
      } 
     } 
    } 
}

它打印：

matched = // comment 1 
matched = /* 
    comment 2 
    // no line comment 
    */ 
matched = // group 1 
matched = // group 2 
matched = // group 3 
matched = // group 4 
matched = // group 5

或者，也许更健壮的解决方案是使用一个小解析器或解析器生成器。 ANTLR有一个很好的选择，只定义一部分语言的语法，而忽略其余部分。我在this previous Q&A中证明了这一点。缺点是你需要学习一些ANTLR ...

来源

2011-03-09 08:13:11

不错的一个！但是'\ u002F * * /'呢？：P – 2011-03-09 10:18:26

@Alan：糟糕的运动！ :)我会把它作为读者的练习。 – 2011-03-09 10:23:41

虽然它可以用正则表达式解决，但解析任何类型的结构化标记时，最好的解决方案是使用实际理解所用语言的解析器。

在这种情况下：使用Java grammar的Java源解析器，如javaparser或ANTLR的定制解决方案。

来源

2011-03-09 08:16:15

回答

相关问题