2014-12-07 44 views
0
import java.util.TreeMap; 
import java.io.*; 
import java.util.Map; 

public class ReadFile { 
    public static TreeMap<String, Integer> generateFrequencyList() 
      throws IOException { 
     TreeMap<String, Integer> wordsFrequencyMap = new TreeMap<String, Integer>(); 
     String file = "file1.txt"; 
     BufferedReader br = new BufferedReader(new FileReader(file)); 
     String line; 
     while ((line = br.readLine()) != null) { 
      String[] tokens = line.split("\\s+"); 
      for (String token : tokens) { 
       token = removePunctuation(token); 
       if (!wordsFrequencyMap.containsKey(token.toLowerCase())) { 
        wordsFrequencyMap.put(token.toLowerCase(), 1); 
       } else { 
        int count = wordsFrequencyMap.get(token.toLowerCase()); 
        wordsFrequencyMap.put(token.toLowerCase(), count + 1); 
       } 
      } 
     } 
     return wordsFrequencyMap; 
    } 

    private static String removePunctuation(String token) { 
     token = token.replaceAll(",", "").replaceAll("\\.", "").replaceAll(";", "").replaceAll("!", ""); 
     return token; 
    } 

    public static void main(String[] args) { 
     try { 
      TreeMap<String, Integer> freqMap = generateFrequencyList(); 
      for (final Map.Entry<String, Integer> entry : freqMap.entrySet()) { 
       final String key = entry.getKey(); 
       final Integer value = entry.getValue(); 
       float total = 0; 
       for (final Integer wordCount : freqMap.values()) { 
        total += wordCount; 
       } 
       final float percentage = (value/total) * 100; 
       System.out.println(key + " = " + value + " => " + percentage); 
      } 
     } catch (Exception e) { 
      e.printStackTrace(); 
     } 
    } 
} 

我需要该程序来读取.txt文件并返回单词列表,频率和百分比。我使用的.txt文件比打印出的文字多。这似乎只是印出T - Z的单词。我不知道如何解决它给我所有的话。任何人都有一些想法,为什么它不给我整个单词列表?文件读取器程序未打印单词的全部字母

+0

您能否提供您输入文件的一些示例。 – 2014-12-07 02:30:20

+0

查看评论我留在我的回答你的其他问题。此外,您不需要在每次迭代中重新计算“总数”。 – outlyer 2014-12-07 02:47:11

回答

0

对我来说工作正常..也许你的测试文件和例子可能会有所帮助.. 只是一个提示..而它可能会想要在for循环之外计算“total”。 :)

我/我需要这个程序读取.txt文件并返回一个单词列表, 频率和百分比。我使用的.txt文件比打印出来的文字多 。它似乎只是从T - Z打印出来的单词。我不知道如何解决它给我的所有单词。任何人 有一些想法,为什么它不给我全部的单词列表?

O/P“ - = 1

=> 1.388889 A = 1 => 1.388889所有= 1 => 1.388889 AM = 2 => 2.777778和= 2 => 2.777778任何= 1 => 1.388889 = 1 => 1.388889 = 1 = => 1.388889 file = 2 => 2.777778 fix = 1 => 1.388889 frequency = 1 => 1.388889 = = 1.388889 full = 1 => 1.388889 give = 2 = > 2.777778 has = 1 => 1.388889 have = 1 => 1.388889 how = 1 => 1.388889 i = 3 => 4.166667 ideas = 1 => 1.388889 it = 3 => 4.166667 list = 1 => 1.388889 list?= 1 = > 1.388889 me = 2 => 2.777778 more = 1 => 1.388889 need = 1 => 1.388889 not = 2 => 2.777778 of = 1 => 1.388889 only = 1 => 1.388889 out = 2 => 2.777778 percentage = 1 => 1.388889 printing = 1 => 1.388889 prints = 1 => 1.388889 program = 1 => 1.388889 read = 1 => 1.388889 return = 1 => 1.388889表示= 1 => 1.388889 = 1 => 1.388889 = 1 => 1.388889 = 1 => 1.388889 => = 1.388889 => = 2.947747 = 2 => 2.777778 = > 5.555556 TXT = 2 => 2.777778使用= 1 => 1.388889为什么= 1 => 1.388889字= 2 => 2.777778词语= 3 => 4.166667 Z = 1 => 1.388889"

0

我跑的代码在这里也看起来很好。这里是我的输出:

a = 5 => 3.164557 
accept = 1 => 0.6329114 
acceptance = 1 => 0.6329114 
accepts = 1 => 0.6329114 
and = 1 => 0.6329114 
any = 3 => 1.8987341 
applicable = 1 => 0.6329114 
are = 2 => 1.2658228 
at = 2 => 1.2658228 
available = 1 => 0.6329114 
be = 4 => 2.5316455 
been = 1 => 0.6329114 
being = 1 => 0.6329114 
billed = 1 => 0.6329114 
buy = 1 => 0.6329114 
by = 2 => 1.2658228 
can = 1 => 0.6329114 
card = 1 => 0.6329114 
charge = 1 => 0.6329114 
charges = 1 => 0.6329114 
confirmation = 1 => 0.6329114 
constitute = 1 => 0.6329114 
credit = 1 => 0.6329114 
customer = 2 => 1.2658228 
deposit = 1 => 0.6329114 
direct = 1 => 0.6329114 
do = 1 => 0.6329114 
does = 1 => 0.6329114 
email = 3 => 1.8987341 
for = 3 => 1.8987341 
from = 2 => 1.2658228 
hardware = 1 => 0.6329114 
has = 1 => 0.6329114 
if = 2 => 1.2658228 
is = 1 => 0.6329114 
lenovo = 3 => 1.8987341 
lenovo's = 1 => 0.6329114 
limit = 1 => 0.6329114 
locations = 1 => 0.6329114 
made = 1 => 0.6329114 
making = 1 => 0.6329114 
may = 1 => 0.6329114 
multiple = 1 => 0.6329114 
no = 1 => 0.6329114 
not = 3 => 1.8987341 
notify = 1 => 0.6329114 
number = 1 => 0.6329114 
of = 2 => 1.2658228 
once = 1 => 0.6329114 
one = 1 => 0.6329114 
only = 1 => 0.6329114 
or = 5 => 3.164557 
order = 5 => 3.164557 
orders = 1 => 0.6329114 
particular = 1 => 0.6329114 
payment = 1 => 0.6329114 
phone = 1 => 0.6329114 
processed = 2 => 1.2658228 
product = 4 => 2.5316455 
providing = 1 => 0.6329114 
purchase = 1 => 0.6329114 
reason = 1 => 0.6329114 
receive = 1 => 0.6329114 
refunded = 1 => 0.6329114 
refuse = 1 => 0.6329114 
reserves = 1 => 0.6329114 
right = 1 => 0.6329114 
sell = 1 => 0.6329114 
service = 1 => 0.6329114 
shipped = 2 => 1.2658228 
shipping = 4 => 2.5316455 
single = 1 => 0.6329114 
software = 1 => 0.6329114 
some = 1 => 0.6329114 
thank = 1 => 0.6329114 
that = 1 => 0.6329114 
the = 5 => 3.164557 
this = 2 => 1.2658228 
time = 2 => 1.2658228 
to = 6 => 3.7974682 
units = 1 => 0.6329114 
updates = 1 => 0.6329114 
we = 2 => 1.2658228 
will = 5 => 3.164557 
you = 5 => 3.164557 
your = 8 => 5.063291 

仅供参考我没加br.close()因为我的编辑抱怨说,BufferReader没有被关闭。但是这不影响输出。

 } 
    } 
    br.close(); 
    return wordsFrequencyMap; 
+0

我很困惑,我没有打印所有的单词。它可以是我的.txt文件吗? – WillLaPenta 2014-12-07 20:45:31

相关问题