2017-01-10 62 views
0

我正在从书本上练习Java自己。我阅读了关于文本处理和包装类的章节,并尝试了下面的练习。Java - 如何划定单词中的单引号而不是单词中的撇号

字计数器

编写一个程序,询问用户文件的名称。该程序应显示文件包含的字数。

import java.io.File; 
import java.io.IOException; 
import java.util.Scanner; 
import java.util.StringTokenizer; 

public class FileWordCounter { 

    public static void main(String[] args) throws IOException { 

     // Create a Scanner object 
     Scanner keyboard = new Scanner(System.in); 

     // Ask user for filename 
     System.out.print("Enter the name of a file: "); 
     String filename = keyboard.nextLine(); 

     // Open file for reading 
     File file = new File(filename); 
     Scanner inputFile = new Scanner(file); 

     int words = 0; 
     String word = ""; 

     while (inputFile.hasNextLine()) { 
      String line = inputFile.nextLine(); 
      System.out.println(line); // for debugging 
      StringTokenizer stringTokenizer = new StringTokenizer(line, " \n.!?;,()"); // Create a StringTokenizer object and use the current line contents and delimiters as parameters 
      while (stringTokenizer.hasMoreTokens()) { // for each line do this 
       word = stringTokenizer.nextToken(); 
       System.out.println(word); // for debugging 
       words++; 
      } 
      System.out.println("Line contains " + words + " words"); 
     } 

     // Close file 
     inputFile.close(); 

     System.out.println("The file has " + words + " words."); 
    } 

} 

我从网上选了这首随机诗来测试这个程序。我把诗在一个名为TheSniper.txt文件:

Two hundred yards away he saw his head; 
He raised his rifle, took quick aim and shot him. 
Two hundred yards away the man dropped dead; 
With bright exulting eye he turned and said, 
'By Jove, I got him!' 
And he was jubilant; had he not won 
The meed of praise his comrades haste to pay? 
He smiled; he could not see what he had done; 
The dead man lay two hundred yards away. 
He could not see the dead, reproachful eyes, 
The youthful face which Death had not defiled 
But had transfigured when he claimed his prize. 
Had he seen this perhaps he had not smiled. 
He could not see the woman as she wept 
To the news two hundred miles away, 
Or through his very dream she would have crept. 
And into all his thoughts by night and day. 
Two hundred yards away, and, bending o'er 
A body in a trench, rough men proclaim 
Sadly, that Fritz, the merry is no more. 
(Or shall we call him Jack? It's all the same.) 

下面是我的一些输出... 出于调试目的,我打印出每一行,并在文件中向上的总的话,包括那些在当前行。

Enter the name of a file: TheSniper.txt 
Two hundred yards away he saw his head; 
Two 
hundred 
yards 
away 
he 
saw 
his 
head 
Line contains 8 words 
He raised his rifle, took quick aim and shot him. 
He 
raised 
his 
rifle 
took 
quick 
aim 
and 
shot 
him 
Line contains 18 words 
... 

最后,我的程序显示该诗有176个单词。但是,Microsoft Word计数了174个单词。我从打印每个单词看到我错误地指出了单引号和单引号。这里是诗在我输出的最后一节发生问题:

(Or shall we call him Jack? It's all the same.) 
Or 
shall 
we 
call 
him 
Jack 
It 
s 
all 
the 
same 
Line contains 176 words 
The file has 176 words 

在我StringTokenizer的参数列表,当我不划定一个单引号,它看起来像一个单引号,单词“这是”被算作一个。但是,当我这样做时,它被计为两个单词(It和s),因为看起来像单引号的撇号被划定。另外,这句话'通过天王,我得到了他!'当我不划定单引号/撇号时错误计数。撇号和单引号是否在界定它们时引用了相同的字符?我不确定如何划定围绕短语的单引号,而不是像“它是”这样的单词之间的单引号。我希望在问我的问题时我有点清楚。请要求澄清。任何指导表示赞赏。谢谢!

+1

没有任何理由,为什么你不能只使用空格(空格,制表符,换行符)作为你的分隔符?一语中的'“天哪,我得到了他!”要是第一个字是'没关系“By',最后是'他!”'为_counting_话来说,即使它看起来并不在打印出发现的单词时(这仅用于调试,根据您的评论)很好。 (另请参阅http://stackoverflow.com/questions/8813779/) –

+0

谢谢!这就说得通了。 – camelCoder

回答

1

为什么不为每行使用另一台扫描仪来计算字数?

int words = 0; 
    while (inputFile.hasNextLine()) { 
     int lineLength = 0; 
     Scanner lineScanner = new Scanner(inputFile.nextLine()); 
     while (lineScanner.hasNext()) { 
      System.out.println(lineScanner.next()); 
      lineLength++; 
     } 
     System.out.println("Line contains " + lineLength + " words"); 
     words += lineLength; 
    } 

我不相信这是可能划定一个单引号像一个短语“‘天哪,我得到了他!’”,但忽略它在“这”,除非你使用正则表达式搜索忽略单词中的单引号。

或者,你可以把字符“!?,()”作为一个词的一部分(如‘?杰克’是一个字),它会给你正确的字数。这是扫描仪的功能。只要改变分隔符在你的StringTokenizer为“”(\ n为不需要的,因为你已经在搜索每一行):

StringTokenizer stringTokenizer = new StringTokenizer(line, " "); 
相关问题