for (a = 0; a < filename; a++) {
try {
System.out
.println(" _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ");
System.out.println("\n");
System.out.println("The word inputted : " + word2);
File file = new File(
"C:\\Users\\user\\fypworkspace\\TextRenderer\\abc" + a
+ ".txt");
System.out.println(" _________________");
System.out.print("| File = abc" + a + ".txt | \t\t \n");
for (int i = 0; i < array2.length; i++) {
totalCount = 0;
wordCount = 0;
Scanner s = new Scanner(file);
{
while (s.hasNext()) {
totalCount++;
if (s.next().equals(array2[i]))
wordCount++;
}
System.out.print(array2[i] + " --> Word count = "
+ "\t " + "|" + wordCount + "|");
System.out.print(" Total count = " + "\t " + "|"
+ totalCount + "|");
System.out.printf(" Term Frequency = | %8.4f |",
(double) wordCount/totalCount);
System.out.println("\t ");
double inverseTF = Math.log10((float) numDoc
/(numofDoc[i]));
System.out.println(" --> IDF = " + inverseTF);
double TFIDF = (((double) wordCount/totalCount) * inverseTF);
System.out.println(" --> TF/IDF = " + TFIDF + "\n");
}
}
} catch (FileNotFoundException e) {
System.out.println("File is not found");
}
}
}
这是我的代码来计算每个我在里面输入的查询的期限频率。 现在我正在尝试为每个文件添加每个查询频率。我如何总计每个文件查询计数?
输出示例:
文件的数量是这个文件夹是:11 请输入查询: 你怎么样 如何 - >这个数字包含这个词3 是文件 - >这个数字包含这个词的文件7 你 - >包含该字词7
字输入文件的这个数字:你怎么样
| File = abc0.txt |
how - >Word count = | 4 |总计数= | 957 |术语频率= | 0.0042 |
- > IDF = 0.5642714398516419 - > TF/IDF = 0.0023585013159943234
是 - >字数 = | 7 |总计数= | 957 |术语频率= | 0.0073 |
- > IDF = 0.1962946357308887 - > TF/IDF = 0.00143580193324579
你 - >字数 = | 10 |总计数= | 957 |术语频率= | 0.0104 |
- > IDF = 0.1962946357308887 - > TF/IDF = 0.002051145618922557
实施例:总频率为4 + 7 + 10 = 21 ..
输入的字:你怎么样
| File = abc1.txt |
how - >Word count = | 4 |总计数= | 959 |术语频率= | 0.0042 |
- > IDF = 0.5642714398516419 - > TF/IDF = 0.0023535826479734803
是 - >字数 = | 7 |总计数= | 959 |术语频率= | 0.0073 |
- > IDF = 0.1962946357308887 - > TF/IDF = 0.0014328075600794795
你 - >字数 = | 10 |总计数= | 959 |术语频率= | 0.0104 |
- > IDF = 0.1962946357308887 - > TF/IDF = 0.002046867942970685
我怎样才能使它以总价3查询字数为每个文件?
示例:总频率为4 + 7 + 10 = 21 ..
可能重复[?如何总结总值(http://stackoverflow.com/questions/5298489/how-to-sum总价值) – 2011-03-15 13:36:49
不,这是我面临的另一个问题,但是,我已经弄清楚了......感谢您的关注。 – 2011-03-15 13:44:03
如果是这样的话,那么你真的很难弄清楚你实际上在问什么。 – 2011-03-15 13:45:53