如何从txt文件中计算单词的频率 - Java

-3

我需要一些关于此代码的帮助。我希望我的程序能够根据所描述的模式计算匹配的每个单词的频率。如何从txt文件中计算单词的频率 - Java

public class Project { 
    public static void main(String[] args) throws FileNotFoundException{ 
    Scanner INPUT_TEXT = new Scanner(new File("moviereview.txt")).useDelimiter(" "); 

    String pattern = "[a-zA-Z'-]+"; 
    Pattern r = Pattern.compile(pattern); 

    int occurences=0; 

    while(INPUT_TEXT.hasNext()){ 
     //read next word 
     String Stringcandidate=INPUT_TEXT.next(); 

     //see if pattern matches (boolean find) 
     if(r.matcher(Stringcandidate).find()) { 
      occurences++; //increment occurences if pattern is found 
      String moviereview = m.group(0); //retrieve found string 
      String moviereview2 = moviereview.toLowerCase(); // ??? 

      System.out.println(moviereview2 + " appears " + occurences); 
      if(occurences>1){ 
       System.out.println(" times\n"); 
      } 
      else{ 
       System.out.println(" time\n"); 
      } 
     } 
     INPUT_TEXT.close();//Close your Scanner.  
    } 

}

来源

2016-11-19 Naz Muh

你能更具体吗？现在发生了什么？我们不在这里为您运行您的代码。而且我们没有你的文本文件 –

我不能帮你。当你甚至无法正确格式化（缩进）代码以显示代码结构时，我拒绝查看代码。 – Andreas

欢迎来到StackOverflow。如果您按照帮助中心提供的指导方针，最有可能获得有用的答案。例如，像这样：“寻求调试帮助的问题（”为什么这个代码不工作？“）必须包含所需的行为，特定的问题或错误以及在问题本身中重现问题所需的最短代码。没有明确问题陈述的问题对其他读者没有用处。“ –

正如我在之前的评论中所述，可以使用Map（HashMap）来存储匹配的单词及其出现频率。

我建议将程序的功能封装到较小的方法/类中，以便每个方法/类只执行一项小任务。所以代码可以更好地读取。

我假定你的文件将包含字符串“自动布什胜过她的番茄在矮牵牛汽车”

下面是代码：

package how_to_calculate_the_frequency; 

import java.io.File; 
import java.io.FileNotFoundException; 
import java.util.HashMap; 
import java.util.Scanner; 
import java.util.regex.Matcher; 
import java.util.regex.Pattern; 

public class Project { 

    HashMap<String, Integer> map = new HashMap<String, Integer>(); 

    public static void main(String[] args){ 

     Project project = new Project(); 

     Scanner INPUT_TEXT = project.readFile(); 

     project.analyse(INPUT_TEXT); 

     project.showResults(); 

    } 

    /** 
    * logic to count the occurences of words matched by REGEX in a scanner that 
    * loaded some text 
    * 
    * @param scanner 
    *   the scanner holding the text 
    */ 
    public void analyse(Scanner scanner) { 

     String pattern = "[a-zA-Z'-]+"; 
     Pattern r = Pattern.compile(pattern); 

     while (scanner.hasNext()) { 
      // read next word 
      String Stringcandidate = scanner.next(); 

      // see if pattern matches (boolean find) 
      Matcher matcher = r.matcher(Stringcandidate); 
      if (matcher.find()) { 
       String matchedWord = matcher.group(); 
       //System.out.println(matchedWord); //check what is matched 
       this.addWord(matchedWord); 

      } 

     } 
     scanner.close();// Close your Scanner. 
    } 

    /** 
    * adds a word to the <word,count> Map if the word is new, a new entry is 
    * created, otherwise the count of this word is incremented 
    */ 
    public void addWord(String matchedWord) { 

     if (map.containsKey(matchedWord)) { 
      // increment occurrence 
      int occurrence = map.get(matchedWord); 
      occurrence++; 
      map.put(matchedWord, occurrence); 
     } else { 
      // add word and set occurrence to 1 
      map.put(matchedWord, 1); 
     } 

    } 

    /** 
    * reads a file from disk and returns a scanner to analyse it 
    * 
    * @return the file from disk as scanner 
    */ 
    public Scanner readFile() { 

     Scanner scanner = null; 

     /* use that for reading a file from disk 
     * try { scanner = new Scanner(new 
     * File("moviereview.txt")).useDelimiter(" "); } catch (Exception e) { 
     * e.printStackTrace(); } 
     */ 

     scanner = new Scanner("auto bush trumped her tomato in the petunia auto"); 

     return scanner; 
    } 

    /** 
    * prints the matched words and their occurrences 
    * in a readable way 
    */ 
    public void showResults() { 

     for (HashMap.Entry<String, Integer> matchedWord : map.entrySet()) { 
      int occurrence = matchedWord.getValue(); 
      System.out.print("\"" + matchedWord.getKey() + "\" appears " + occurrence); 
      if (occurrence > 1) { 
       System.out.print(" times\n"); 
      } else { 
       System.out.print(" time\n"); 
      } 
     } 

     // or as the new Java 8 lambda expression 
     // map.forEach((word,occurrence)->System.out.println("\"" + word + "\" 
     // appears " + occurrence + " times")); 
    } 
} 

// DONE seperate reading a file, analysing the file and 
// word-frequency-counting-logic in different 
// methods 
// Done implement <word,count> Map and logic to add new and known(to the map) 
// words

这产生了：

“的”出现1时间

“自动” 出现2次

“她” AP梨1时间

“在” 出现1次

“衬套” 出现1次

“捏造” 出现1次

“番茄” 出现1次

“矮牵牛”出现1次

关于

来源

2016-11-21 04:46:33

如何从txt文件中计算单词的频率 - Java

回答

相关问题