2011-03-12 81 views
0
// Calculating term frequency 
    System.out.println("Please enter the required word :"); 
    Scanner scan = new Scanner(System.in); 
    String word = scan.nextLine(); 

    String[] array = word.split(" "); 
    int filename = 11; 
    String[] fileName = new String[filename]; 
    int a = 0; 
    int totalCount = 0; 
    int wordCount = 0; 


    for (a = 0; a < filename; a++) { 

     try { 
      System.out.println("The word inputted is " + word); 
      File file = new File(
        "C:\\Users\\user\\fypworkspace\\TextRenderer\\abc" + a 
          + ".txt"); 
      System.out.println(" _________________"); 

      System.out.print("| File = abc" + a + ".txt | \t\t \n"); 

      for (int i = 0; i < array.length; i++) { 

       totalCount = 0; 
       wordCount = 0; 

       Scanner s = new Scanner(file); 
       { 
        while (s.hasNext()) { 
         totalCount++; 
         if (s.next().equals(array[i])) 
          wordCount++; 

        } 

        System.out.print(array[i] + " ---> Word count = " 
          + "\t\t " + "|" + wordCount + "|"); 
        System.out.print(" Total count = " + "\t\t " + "|" 
          + totalCount + "|"); 
        System.out.printf(" Term Frequency = | %8.4f |", 
          (double) wordCount/totalCount); 

        System.out.println("\t "); 

       } 
      } 
     } catch (FileNotFoundException e) { 
      System.out.println("File is not found"); 

     } 

    } 

System.out.println("Please enter the required word :"); 
    Scanner scan2 = new Scanner(System.in); 
    String word2 = scan2.nextLine(); 
    String[] array2 = word2.split(" "); 
    int numofDoc; 

    for (int b = 0; b < array2.length; b++) { 

     numofDoc = 0; 

     for (int i = 0; i < filename; i++) { 

      try { 

       BufferedReader in = new BufferedReader(new FileReader(
         "C:\\Users\\user\\fypworkspace\\TextRenderer\\abc" 
           + i + ".txt")); 

       int matchedWord = 0; 

       Scanner s2 = new Scanner(in); 

       { 

        while (s2.hasNext()) { 
         if (s2.next().equals(array2[b])) 
          matchedWord++; 
        } 

       } 
       if (matchedWord > 0) 
        numofDoc++; 

      } catch (IOException e) { 
       System.out.println("File not found."); 
      } 

     } 
     System.out.println(array2[b] 
       + " --> This number of files that contain the term " 
       + numofDoc); 
     double inverseTF = Math.log10((float) numDoc/numofDoc); 
     System.out.println(array2[b] + " --> IDF " + inverseTF); 
     double TFIDF = (((double) wordCount/totalCount) * inverseTF); 
     System.out.println(array2[b] + " --> TFIDF " + TFIDF); 
    } 
} 

嗨,这是我计算词频和TF-IDF的代码。第一个代码为给定字符串的每个文件计算术语频率。第二个代码应该使用上面的值来计算每个文件的TF-IDF。但我只收到一个价值。它应该为每个文档提供TF-IDF值。为什么我只能得到TF-IDF的一个结果?

实施例输出的词频:

字输入为 '是'


| File = abc0.txt |
是--->字数= | 2 |总计数= | 150 |术语频率= | 0.0133 |

输入的词是 '是'


| File = abc1.txt |
是--->字数= | 0 |总数= | 9 |术语频率= | 0.0000 |

的TFIDF

是 - >此数字包含术语7

是文件 - > IDF 0.1962946357308887

是 - > TFIDF 0.0028607962606519654 < < <我猜想为每个文件获取一个值,意味着我有10个文件,它假设为每个不同的文件提供10个不同的值。但是,它只能打印一个结果。有人能指出我的错误吗?

+4

除了实际的答案(由霍华德给出的),你应该更注重命名。具有名为“文件名”和“文件名”的变量,其中之一是“int”,这是非常令人困惑的。 – 2011-03-12 08:42:10

回答

0

打印TDIDF的部分需要在循环遍历所有文件的for循环中移动。

即:

System.out.println(array2[b] 
      + " --> This number of files that contain the term " 
      + numofDoc); 
    double inverseTF = Math.log10((float) numDoc/numofDoc); 
    System.out.println(array2[b] + " --> IDF " + inverseTF); 
    double TFIDF = (((double) wordCount/totalCount) * inverseTF); 
    System.out.println(array2[b] + " --> TFIDF " + TFIDF); 
} 

}}

1

你想每个文件要重复的println语句,是

double TFIDF = (((double) wordCount/totalCount) * inverseTF); 
System.out.println(array2[b] + " --> TFIDF " + TFIDF); 

,但它包含在单仅环

for (int b = 0; b < array2.length; b++) 

。如果要为每个文件打印此行,则必须通过另一个循环遍历所有文件来包围此语句。

由于这是作业,我不会包含最终代码,但会给出另一个提示:在TFIDF的计算中还包含变量wordCount和totalCount。但是这些对每个文件名/字对都是唯一的。因此,您不仅需要将其保存一次,而且还要保存每个文件名/字或在最终循环中重新排列它们。

相关问题