我想忽略标点符号。因此,我试图编写一个程序来计算我的文本中每个单词的所有出现,但没有考虑标点符号。 所以我的计划是:如何忽略标点符号c#
static void Main(string[] args)
{
string text = "This my world. World, world,THIS WORLD ! Is this - the world .";
IDictionary<string, int> wordsCount =
new SortedDictionary<string, int>();
text=text.ToLower();
text = text.replaceAll("[^0-9a-zA-Z\text]", "X");
string[] words = text.Split(' ',',','-','!','.');
foreach (string word in words)
{
int count = 1;
if (wordsCount.ContainsKey(word))
count = wordsCount[word] + 1;
wordsCount[word] = count;
}
var items = from pair in wordsCount
orderby pair.Value ascending
select pair;
foreach (var p in items)
{
Console.WriteLine("{0} -> {1}", p.Key, p.Value);
}
}
输出是:
is->1
my->1
the->1
this->3
world->5
(here is nothing) -> 8
我怎么可以在这里删除标点?
使用'text.Split(新[] {” “” “,” - “,”!“,”。“},StringSplitOptions.RemoveEmptyEntries);'排除空的条目。 – Kvam