2013-03-11 49 views
2

程序将标记文件的内容并显示内容,但很少有术语被合并在一起并显示出来?已合并的术语已被显示

import java.io.*; 
import java.util.*; 
class JavaApplication1 
{ 
    static HashMap<String,Integer>hTable=new HashMap<String,Integer>(); 
    static int word,uwords,oncewords; 
    public static void main(String args[])throws IOException 
    {  

     File folder=new File(File.txt); 
     File[] lFile=folder.listFiles(); 
     int len=lFile.length; 
     for(int i=0 ;i<1 ;i++) { 
      File file=lFile[i]; 
      if(file.isFile()) { 
        Scanner scanner=new Scanner(file); 
        String line = null; 
        StringBuilder sb = new StringBuilder(); 
        while(scanner.hasNextLine()) { 
         line=scanner.nextLine(); 
         sb.append(line); 
        } 
      // StringTokenizer st=new StringTokenizer(sb.toString(),"</>,?.[/]=()+|"); 
       StringTokenizer st=new StringTokenizer(sb.toString()," </DOC>.,TITLE-\n"); 
      //System.out.println("*************************"); 
      while(st.hasMoreTokens()) 
      { 
       String next=st.nextToken(); 
       word=word+1;  
       if(hTable.containsKey(next)) 
       { 
        int a=hTable.get(next); 
        hTable.put(next, a+1); 
        uwords++; 
       } 
      else 
       { 
        hTable.put(next,1); 
          System.out.println(next); 
        oncewords++; 
       } 
      } 

      } 
      } 
      System.out.println("Total number of tokens in the database is"+word); 
      System.out.println("Total number of tokens that are unique in the database are "+ uwords); 
      System.out.println("Total number of tokens that occur only once in the database is" +oncewords); 

      int count=0; 
      Collection <Integer> setofvalues=hTable.values(); 
      Object[] Varr=setofvalues.toArray(); 
      Arrays.sort(Varr,Collections.reverseOrder()); 
      Set<Object> Set1 = new LinkedHashSet<Object>(Arrays.asList(Varr)); 
      for (Object i:Set1) 
      { 
      for (Map.Entry<String, Integer> entry : hTable.entrySet()) 
      { 
      /* if (i.equals(entry.getValue())&&count<30) 
      { 
      System.out.println(entry.getKey()+ "=" +entry.getValue()); 
      count=count+1; 
      }*/ 
      } 
      } 

      int avg=(word/len); 
      System.out.println("The average number of tokens per document" +avg); 
      } 
        } 



and contents of file are: 
<DOC> 
<DOCNO> 
1 
</DOCNO> 
<TITLE> 
experimental investigation of the aerodynamics of a 
wing in a slipstream . 
</TITLE> 
<AUTHOR> 
brenckman,m. 
</AUTHOR> 
<BIBLIO> 
j. ae. scs. 25, 1958, 324. 
</BIBLIO> 
<TEXT> 
    an experimental study of a wing in a propeller slipstream was 
made in order to determine the spanwise distribution of the lift 
increase due to slipstream at different angles of attack of the wing 
and at different free stream to slipstream velocity ratios . the 
results were intended in part as an evaluation basis for different 
theoretical treatments of this problem . 
    the comparative span loading curves, together with supporting 
evidence, showed that a substantial part of the lift increment 
produced by the slipstream was due to a /destalling/ or boundary-layer-control 
effect . the integrated remaining lift increment, 
after subtracting this destalling lift, was found to agree 
well with a potential flow theory . 
    an empirical evaluation of the destalling effects was made for 
the specific configuration of the experiment . 
</TEXT> 
</DOC> 

and the output is: 
N 
1 
experimental 
investigation 
of 
the 
aerodynamics 
awing 
in 
a 
slipstream 
AU 
H 
R 
brenckman 
m 
B 
j 
ae 
scs 
25 
1958 
324 
X 
an 
study 
wing 
propeller 
wasmade 
order 
to 
determine 
spanwise 
distribution 
liftincrease 
due 
at 
different 
angles 
attack 
wingand 
free 
stream 
velocity 
ratios 
theresults 
were 
intended 
part 
as 
evaluation 
basis 
for 
differenttheoretical 
treatments 
this 
problem 
comparative 
span 
loading 
curves 
together 
with 
supportingevidence 
showed 
that 
substantial 
lift 
incrementproduced 
by 
was 
destalling 
or 
boundary 
layer 
controleffect 
integrated 
remaining 
increment 
after 
subtracting 
found 
agreewell 
potential 
flow 
theory 
empirical 
effects 
made 
forthe 
specific 
configuration 
experiment 
Total number of tokens in the database is151 
Total number of tokens that are unique in the database are 58 
Total number of tokens that occur only once in the database is93 

回答

1

的问题似乎是:

line=scanner.nextLine(); 
    sb.append(line); 

读他们到某人时,你不加线之间的空白,所以在一条线上的最后一个字被用在第一个字合并下一行。