2016-05-16 75 views
2

我一直在尝试从字符串列表中查找最频繁的单词。我试过类似Find the most occurring number in a List<int>使用LINQ查找最频繁的单词

但问题是它只返回一个字,但所有这些字都是最常见的

例如,如果我们称以下列表LINQ查询:

Dubai 
Karachi 
Lahore 
Madrid 
Dubai 
Sydney 
Sharjah 
Lahore 
Cairo 

应该引起我们:

答:迪拜,拉合尔

+0

哪里是你写的试图解决问题的代码? –

回答

3

使用一组通过,然后才能by count:

var result = list 
    .GroupBy(s => s) 
    .Where(g=>g.Count()>1) 
    .OrderByDescending(g => g.Count()) 
    .Select(g => g.Key); 
+0

只是一个无关的问题。我们是否可以限制选择仅存在多次的限制? –

+1

当然,它就像在SQL中的“HAVING” – octavioccl

2

如果您需要反复出现的所有单词..

List<string> list = new List<string>(); 
      list.Add("A"); 
      list.Add("A"); 
      list.Add("B"); 
      var most = (from i in list 
         group i by i into grp 
         orderby grp.Count() descending 
         select new { grp.Key, Cnt = grp.Count() }).Where (r=>r.Cnt>1); 
1

如果你想获得多个最常用的单词,你可以用这个方法:

public List<string> GetMostFrequentWords(List<string> list) 
{ 
    var groups = list.GroupBy(x => x).Select(x => new { word = x.Key, Count = x.Count() }).OrderByDescending(x => x.Count); 
    if (!groups.Any()) return new List<string>(); 

    var maxCount = groups.First().Count; 

    return groups.Where(x => x.Count == maxCount).Select(x => x.word).OrderBy(x => x).ToList(); 
} 

[TestMethod] 
public void Test() 
{ 
    var list = @"Dubai,Karachi,Lahore,Madrid,Dubai,Sydney,Sharjah,Lahore,Cairo".Split(',').ToList(); 
    var result = GetMostFrequentWords(list); 

    Assert.AreEqual(2, result.Count); 
    Assert.AreEqual("Dubai", result[0]); 
    Assert.AreEqual("Lahore", result[1]); 
} 
1

如果您想Dubai, Lahore(即只有顶级的发生,这是2个样品中)的话:

List<String> list = new List<String>() { 
    "Dubai", "Karachi", "Lahore", "Madrid", "Dubai", "Sydney", "Sharjah", "Lahore", "Cairo" 
    }; 

    int count = -1; 

    var result = list 
    .GroupBy(s => s, s => 1) 
    .Select(chunk => new { 
     name = chunk.Key, 
     count = chunk.Count() 
    }) 
    .OrderByDescending(item => item.count) 
    .ThenBy(item => item.name) 
    .Where(item => { 
     if (count < 0) { 
     count = item.count; // side effects, alas (we don't know count a-priory) 

     return true; 
     } 
     else 
     return item.count == count; 
    }) 
    .Select(item => item.name); 

测试:

// ans: Dubai, Lahore 
    Console.Write("ans: " + String.Join(", ", result)); 
0

我敢肯定,必须有更好的办法,但有一点我设法使(可能帮助你,使之更加优化)在像遵循

List<string> list = new List<string>(); 
     list.Add("Dubai"); 
     list.Add("Sarjah"); 
     list.Add("Dubai"); 
     list.Add("Lahor"); 
     list.Add("Dubai"); 
     list.Add("Sarjah"); 
     list.Add("Sarjah"); 


     int most = list.GroupBy(i => i).OrderByDescending(grp => grp.Count()) 
      .Select(grp => grp.Count()).First(); 
     IEnumerable<string> mostVal = list.GroupBy(i => i).OrderByDescending(grp => grp.Count()) 
      .Where(grp => grp.Count() >= most) 
      .Select(grp => grp.Key) ; 

那些谁是发生最频繁的,如果两个条目出现频率是相同的这将列出,它们都将包括在内。

注意我们没有选择频率超过一次的条目。