2011-02-05 117 views
1

我想搜索字符串以查找用户输入的特定单词,然后输出单词在文本中显示的百分比。只是想知道最好的方法是什么,如果你能帮助我。搜索特定字的字符串。 C#

+0

确切地说,你是指百分比? – 2011-02-05 11:55:34

+1

我假设他意味着有多少(number_of_times_word_to_find_occurs/total_number_of_words)* 100。 – david 2011-02-05 12:13:05

回答

0

我的建议是一个完整的课程。

class WordCount { 
    const string Symbols = ",;.:-()\t!¡¿?\"[]{}&<>+-*/=#'"; 

    public static string normalize(string str) 
    { 
     var toret = new StringBuilder(); 

     for(int i = 0; i < str.Length; ++i) { 
      if (Symbols.IndexOf(str[ i ]) > -1) { 
       toret.Append(' '); 
      } else { 
       toret.Append(char.ToLower(str[ i ])); 
      } 
     } 

     return toret.ToString(); 
    } 

    private string word; 
    public string Word { 
     get { return this.word; } 
     set { this.word = value; } 
    } 

    private string str; 
    public string Str { 
     get { return this.str; } 
    } 

    private string[] words = null; 
    public string[] Words { 
     if (this.words == null) { 
      this.words = this.Str.split(' '); 
     } 

     return this.words; 
    } 

    public WordCount(string str, string w) 
    { 
     this.str = ' ' + normalize(str) + ' '; 
     this.word = w; 
    } 

    public int Times() 
    { 
     return this.Times(this.Word); 
    } 

    public int Times(string word) 
    { 
     int times = 0; 

     word = ' ' + word + ' '; 

     int wordLength = word.Length; 
     int pos = this.Str.IndexOf(word); 

     while(pos > -1) { 
      ++times; 

      pos = this.Str.IndexOf(pos + wordLength, word); 
     } 

     return times; 
    } 

    public double Percentage() 
    { 
     return this.Percentage(this.Word); 
    } 

    public double Percentage(string word) 
    { 
     return (this.Times(word)/this.Words.Length); 
    } 
} 

优点:字符串分割缓存,所以没有将其应用于超过一次的危险。它包装在一个班级,所以它可以很容易地重新获得。没有Linq的必要性。 希望这有助于。

2

最简单的方法是使用LINQ:

char[] separators = new char() {' ', ',', '.', '?', '!', ':', ';'}; 
var count = 
    (from word In sentence.Split(separators)  // get all the words 
    where word.ToLower() = searchedWord.ToLower() // find the words that match 
    select word).Count();       // count them 

这只能算作这个词出现在文本的次数。你也可以算多少的话有于文:

var totalWords = sentence.Split(separators).Count()); 

,然后就得到百分比:

var result = count/totalWords * 100; 
+3

有这么多的角落案例,这将错过。如果你在“一,二,三”这个句子中搜索“two”,你就不会得到任何匹配,因为split会给出元素“two”(包括逗号)。这意味着您需要考虑各种分隔符,并在分割之前将其除去(除非用户正在搜索它们)。 – 2011-02-05 12:03:14

3

我建议使用String.Equals超载与StringComparison获得更好的性能规定。

var separators = new [] { ' ', ',', '.', '?', '!', ';', ':', '\"' }; 
var words = sentence.Split (separators); 
var matches = words.Count (w => 
    w.Equals (searchedWord, StringComparison.OrdinalIgnoreCase)); 
var percentage = matches/(float) words.Count; 

注意percentagefloat,例如0.5为50%。

var formatted = percentage.ToString ("P0"); // 0.1234 => 12 % 

您还可以更改格式说明显示小数位:

var formatted = percentage.ToString ("P2"); // 0.1234 => 12.34 % 

请记住,这种方法是无效的长字符串,因为
可以使用ToString超载格式化显示它会为每个找到的单词创建一个字符串实例。您可能需要采取StringReader并手动逐字阅读。

0
// The words you want to search for 
var words = new string[] { "this", "is" }; 

// Build a regular expresion query 
var wordRegexQuery = new System.Text.StringBuilder(); 
wordRegexQuery.Append("\\b("); 
for (var wordIndex = 0; wordIndex < words.Length; wordIndex++) 
{ 
    wordRegexQuery.Append(words[wordIndex]); 
    if (wordIndex < words.Length - 1) 
    { 
    wordRegexQuery.Append('|'); 
    } 
} 
wordRegexQuery.Append(")\\b"); 

// Find matches and return them as a string[] 
var regex = new System.Text.RegularExpressions.Regex(wordRegexQuery.ToString(), RegexOptions.IgnoreCase); 
var someText = var someText = "This is some text which is quite a good test of which word is used most often. Thisis isthis athisisa."; 
var matches = (from Match m in regex.Matches(someText) select m.Value).ToArray(); 

// Display results 
foreach (var word in words) 
{ 
    var wordCount = (int)matches.Count(w => w.Equals(word, StringComparison.InvariantCultureIgnoreCase)); 
    Console.WriteLine("{0}: {1} ({2:f2}%)", word, wordCount, wordCount * 100f/matches.Length); 
}