我正在从书本上练习Java自己。我阅读了关于文本处理和包装类的章节,并尝试了下面的练习。Java - 如何划定单词中的单引号而不是单词中的撇号
字计数器
编写一个程序,询问用户文件的名称。该程序应显示文件包含的字数。
import java.io.File;
import java.io.IOException;
import java.util.Scanner;
import java.util.StringTokenizer;
public class FileWordCounter {
public static void main(String[] args) throws IOException {
// Create a Scanner object
Scanner keyboard = new Scanner(System.in);
// Ask user for filename
System.out.print("Enter the name of a file: ");
String filename = keyboard.nextLine();
// Open file for reading
File file = new File(filename);
Scanner inputFile = new Scanner(file);
int words = 0;
String word = "";
while (inputFile.hasNextLine()) {
String line = inputFile.nextLine();
System.out.println(line); // for debugging
StringTokenizer stringTokenizer = new StringTokenizer(line, " \n.!?;,()"); // Create a StringTokenizer object and use the current line contents and delimiters as parameters
while (stringTokenizer.hasMoreTokens()) { // for each line do this
word = stringTokenizer.nextToken();
System.out.println(word); // for debugging
words++;
}
System.out.println("Line contains " + words + " words");
}
// Close file
inputFile.close();
System.out.println("The file has " + words + " words.");
}
}
我从网上选了这首随机诗来测试这个程序。我把诗在一个名为TheSniper.txt文件:
Two hundred yards away he saw his head;
He raised his rifle, took quick aim and shot him.
Two hundred yards away the man dropped dead;
With bright exulting eye he turned and said,
'By Jove, I got him!'
And he was jubilant; had he not won
The meed of praise his comrades haste to pay?
He smiled; he could not see what he had done;
The dead man lay two hundred yards away.
He could not see the dead, reproachful eyes,
The youthful face which Death had not defiled
But had transfigured when he claimed his prize.
Had he seen this perhaps he had not smiled.
He could not see the woman as she wept
To the news two hundred miles away,
Or through his very dream she would have crept.
And into all his thoughts by night and day.
Two hundred yards away, and, bending o'er
A body in a trench, rough men proclaim
Sadly, that Fritz, the merry is no more.
(Or shall we call him Jack? It's all the same.)
下面是我的一些输出... 出于调试目的,我打印出每一行,并在文件中向上的总的话,包括那些在当前行。
Enter the name of a file: TheSniper.txt
Two hundred yards away he saw his head;
Two
hundred
yards
away
he
saw
his
head
Line contains 8 words
He raised his rifle, took quick aim and shot him.
He
raised
his
rifle
took
quick
aim
and
shot
him
Line contains 18 words
...
最后,我的程序显示该诗有176个单词。但是,Microsoft Word计数了174个单词。我从打印每个单词看到我错误地指出了单引号和单引号。这里是诗在我输出的最后一节发生问题:
(Or shall we call him Jack? It's all the same.)
Or
shall
we
call
him
Jack
It
s
all
the
same
Line contains 176 words
The file has 176 words
在我StringTokenizer的参数列表,当我不划定一个单引号,它看起来像一个单引号,单词“这是”被算作一个。但是,当我这样做时,它被计为两个单词(It和s),因为看起来像单引号的撇号被划定。另外,这句话'通过天王,我得到了他!'当我不划定单引号/撇号时错误计数。撇号和单引号是否在界定它们时引用了相同的字符?我不确定如何划定围绕短语的单引号,而不是像“它是”这样的单词之间的单引号。我希望在问我的问题时我有点清楚。请要求澄清。任何指导表示赞赏。谢谢!
没有任何理由,为什么你不能只使用空格(空格,制表符,换行符)作为你的分隔符?一语中的'“天哪,我得到了他!”要是第一个字是'没关系“By',最后是'他!”'为_counting_话来说,即使它看起来并不在打印出发现的单词时(这仅用于调试,根据您的评论)很好。 (另请参阅http://stackoverflow.com/questions/8813779/) –
谢谢!这就说得通了。 – camelCoder