如何获得python中列表中10个最频繁的字符串

我有一个包含93个不同字符串的列表。我需要找到10个最频繁的字符串，并且返回必须从最频繁到最不频繁。如何获得python中列表中10个最频繁的字符串

mylist = ['"and', '"beware', '`twas', 'all', 'all', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'arms', 'as', 'as', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'borogoves', 'borogoves', 'boy', 'brillig'] 
# this is just a sample of the actual list.

我没有蟒蛇的最新版本，并且不能使用计数器。

来源

2012-04-11 Keely Aranyos

您可以使用collections module中的Counter来执行此操作。

from collections import Counter 
c = Counter(mylist)

然后做c.most_common(10)回报

[('and', 13), 
('all', 2), 
('as', 2), 
('borogoves', 2), 
('boy', 1), 
('blade', 1), 
('bandersnatch', 1), 
('beware', 1), 
('bite', 1), 
('arms', 1)]

来源

2012-04-11 04:04:23

接受此消息！ – 2012-04-11 04:05:46

就是这样。没有更多的在灌木丛中跳动。 – 2012-04-11 04:22:36

我没有python的最新版本，也无法使用计数器 – 2012-04-11 04:41:45

不作为问题的修改版本要求

改为使用heap.nlargest使用Counter通过@Duncan

>>> from collections import defaultdict 
>>> from operator import itemgetter 
>>> from heapq import nlargest 
>>> mylist = ['"and', '"beware', '`twas', 'all', 'all', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'arms', 'as', 'as', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'borogoves', 'borogoves', 'boy', 'brillig'] 
>>> c = defaultdict(int) 
>>> for item in mylist: 
     c[item] += 1 


>>> [word for word,freq in nlargest(10,c.iteritems(),key=itemgetter(1))] 
['and', 'all', 'as', 'borogoves', 'boy', 'blade', 'bandersnatch', 'beware', 'bite', 'arms']

来源

2012-04-11 04:05:43 jamylak

我没有python的最新版本，也不能使用计数器 – 2012-04-11 04:45:29

你有'defaultdict'吗？尝试'从集合导入defaultdict'，如果是的话，我可以写一个快速的解决方案。 – jamylak 2012-04-11 04:48:02

是的，我确实有 – 2012-04-11 04:49:14

大卫的建议答案是最好的如果你使用的Python版本不包含来自collections模块的计数器（这是在Python 2.7中引入的），你可以使用计数器类的this implementation做同样的事情。我怀疑它会比模块慢，但会做同样的事情。

来源

2012-04-11 04:54:27

计数器不包含在Python 2.4中，但在2.7。它在文档中是这样说的 - http://docs.python.org/library/collections.html#collections.Counter – 2012-04-11 04:59:20

是的，我已经更新了我的答案以反映正确的版本。但是，提供的解决方案在2.7之前工作。 – 2012-04-11 05:00:38

酷 - 这个片段是由Raymond Hettinger（作品中的很多东西的作者）编写的，非常像2.7源代码。很好的发现。 :) – 2012-04-11 05:07:00

大卫的解决方案是最好的。

但可能更多的乐趣比什么，在这里是不导入任何模块的解决方案：

dicto = {} 

for ele in mylist: 
    try: 
     dicto[ele] += 1 
    except KeyError: 
     dicto[ele] = 1 

top_10 = sorted(dicto.iteritems(), key = lambda k: k[1], reverse = True)[:10]

结果：

>>> top_10 
[('and', 13), ('all', 2), ('as', 2), ('borogoves', 2), ('boy', 1), ('blade', 1), ('bandersnatch', 1), ('beware', 1), ('bite', 1), ('arms', 1)]

编辑：

回答跟进问题：

new_dicto = {} 

for val, key in zip(dicto.itervalues(), dicto.iterkeys()): 

    try: 
     new_dicto[val].append(key) 
    except KeyError: 
     new_dicto[val] = [key] 

alph_sorted = sorted([(key,sorted(val)) for key,val in zip(new_dicto.iterkeys(), new_dicto.itervalues())], reverse = True)

结果：

>>> alph_sorted 
[(13, ['and']), (2, ['all', 'as', 'borogoves']), (1, ['"and', '"beware', '`twas', 'arms', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'boy', 'brillig'])]

，一旦出现按字母顺序排序，如果你发现有些话对他们有多余的引号的字。

编辑：

在回答另一个跟进的问题：

top_10 = [] 

for tup in alph_sorted: 
    for word in tup[1]: 
     top_10.append(word) 
     if len(top_10) == 10: 
      break

结果：

>>> top_10 
['and', 'all', 'as', 'borogoves', '"and', '"beware', '`twas', 'arms', 'awhile', 'back']

来源

2012-04-11 05:10:29 Akavall

那么你怎么才能够得到的单词和具有相同数字的单词，你会如何按字母顺序排列它们 – 2012-04-11 05:21:45

如何获得alph排序的前10名 – 2012-04-12 03:25:01

@KeelyAranyos我编辑了我的张贴回答你的第二个问题，希望它能给你你正在寻找的东西。 – Akavall 2012-04-15 04:20:23

如果你的Python版本不支持计数器，你可以做柜台的实现方式

>>> import operator,collections,heapq 
>>> counter = collections.defaultdict(int) 
>>> for elem in mylist: 
    counter[elem]+=1   
>>> heapq.nlargest(10,counter.iteritems(),operator.itemgetter(1)) 
[('and', 13), ('all', 2), ('as', 2), ('borogoves', 2), ('boy', 1), ('blade', 1), ('bandersnatch', 1), ('beware', 1), ('bite', 1), ('arms', 1)]

如果您看到计数器类，它会创建一个字典，显示出现在可重用的所有元素中然后它将数据放入heapq中，key是字典的值并检索该字典的值

来源

2012-04-11 05:36:08 Abhijit

如何获得python中列表中10个最频繁的字符串

回答

相关问题