Java扫描器hasNext（字符串）方法有时不匹配

我试图使用Java扫描仪hasNext方法，但我得到了奇怪的结果。也许我的问题很明显，但为什么这个简单的简单表达式"[a-zA-Z']+"不适用于这样的词语：“分，任何事，主管”。我也试过这个"[\\w']+"。Java扫描器hasNext（字符串）方法有时不匹配

public HashMap<String, Integer> getDocumentWordStructureFromPath(File file) { 
    HashMap<String, Integer> dictionary = new HashMap<>(); 
    try { 
     Scanner lineScanner = new Scanner(file); 
     while (lineScanner.hasNextLine()) { 
      Scanner scanner = new Scanner(lineScanner.nextLine()); 
      while (scanner.hasNext("[\\w']+")) { 
       String word = scanner.next().toLowerCase(); 
       if (word.length() > 2) { 
        int count = dictionary.containsKey(word) ? dictionary.get(word).intValue() + 1 : 1; 
        dictionary.put(word, new Integer(count)); 
       } 
      } 
      scanner.close(); 
     } 
     //scanner.useDelimiter(DELIMITER); 
     lineScanner.close(); 

     return dictionary; 

    } catch (FileNotFoundException e) { 
     e.printStackTrace(); 
     return null; 
    } 
}

来源

2013-04-07 flatronka

你的正则表达式应该是这样的[^a-zA-z]+，因为你需要所有不信的东西分开：

// previous code... 
Scanner scanner = new Scanner(lineScanner.nextLine()).useDelimiter("[^a-zA-z]+"); 
    while (scanner.hasNext()) { 
     String word = scanner.next().toLowerCase(); 
     // ...your other code 
    } 
} 
// ... after code

EDIT--为什么不与hasNext（String）方法工作??

这条线：

Scanner scanner = new Scanner(lineScanner.nextLine());

它确实是编译whitespce模式适合你，所以如果你有例如该检测线"Hello World. A test, ok."它会提供你这个令牌：

你好
世界。
A
test，
ok。

然后，如果你使用scanner.hasNext("[a-ZA-Z]+")你问扫描仪if there is a token that match your pattern，在这个例子就说明true第一个令牌：

你好（因为这是弗里斯特的凭证，该模式匹配指定）

下一个标记（世界。）it doesn't match the pattern所以它只会fail和scanner.hasNext("[a-ZA-Z]+")将沤瓮false所以它永远不会用于任何不是字母的字符前面的单词。你懂了？

现在...希望这可以帮助。

来源

2013-04-07 17:48:24

非常感谢@Angel Rodriguez这是一个很好的解决方案，但我不知道为什么不与hasnext（String）函数一起工作。 – flatronka 2013-04-07 18:02:34

好吧，我明白了你的意思，我已经编辑过......我解释了为什么它不起作用......希望它有助于... – 2013-04-07 18:35:39

非常感谢你我已经得到了它。非常感谢您的帮助。 +1进行详细解释。 – flatronka 2013-04-07 23:00:24

Java扫描器hasNext（字符串）方法有时不匹配

回答

相关问题