蟒蛇项目频率计数

我是一个蟒蛇新手，所以也许我的问题是很noob。假设我有一个单词列表，我想查找每个单词出现在列表中的次数。明显的方式做到这一点是：蟒蛇项目频率计数

words = "apple banana apple strawberry banana lemon" 
uniques = set(words.split()) 
freqs = [(item, words.split.count(item)) for item in uniques] 
print(freqs)

但我发现这个代码不是很好，因为这样的程序运行，通过文字列表两次，一次打造集，第二时间计数出现数。当然，我可以编写一个函数来遍历列表并进行计数，但这不会是pythonic。那么，有没有更高效和pythonic的方式？（在列表中增加适当的字典循环键）

来源

2009-05-21 Daniyar

不是两次，它看起来像O（N * N）复杂性 – Drakosha 2009-05-21 15:10:37

@Drakosha：同意我刚刚也看到了这一点。 – 2009-05-21 15:12:26

是的，复杂性是O（n^2），但是列表本身是贯穿两次的。 – Daniyar 2009-05-21 15:15:19

defaultdict来救援！

from collections import defaultdict 

words = "apple banana apple strawberry banana lemon" 

d = defaultdict(int) 
for word in words.split(): 
    d[word] += 1

这运行在O（n）。

来源

2009-05-21 15:10:59 Triptych

+1，collections.defaultdict是我最喜欢的容器之一！ – 2009-05-21 15:12:25

如果collection是一棵树，我会说O（NlogN），或者如果它是一个散列，那么O（N）的平均值是 – Drakosha 2009-05-21 15:15:53

dict是一个散列。 – 2009-05-21 15:17:55

如果你不想使用标准的字典的方法，你可以试试这个：

>>> from itertools import groupby 
>>> myList = words.split() # ['apple', 'banana', 'apple', 'strawberry', 'banana', 'lemon'] 
>>> [(k, len(list(g))) for k, g in groupby(sorted(myList))] 
[('apple', 2), ('banana', 2), ('lemon', 1), ('strawberry', 1)]

它运行在为O（n log n）的时间。

来源

2009-05-21 15:09:57

标准的做法：

from collections import defaultdict 

words = "apple banana apple strawberry banana lemon" 
words = words.split() 
result = collections.defaultdict(int) 
for word in words: 
    result[word] += 1 

print result

GROUPBY oneliner：

from itertools import groupby 

words = "apple banana apple strawberry banana lemon" 
words = words.split() 

result = dict((key, len(list(group))) for key, group in groupby(sorted(words))) 
print result

来源

2009-05-21 15:11:47 nosklo

118

如果您正在使用Python 2.7 +/3.1 +，还有集合模块，是专为解决这一目的在Counter Class类型的问题：

>>> from collections import Counter 
>>> words = "apple banana apple strawberry banana lemon" 
>>> freqs = Counter(words.split()) 
>>> print(freqs) 
Counter({'apple': 2, 'banana': 2, 'strawberry': 1, 'lemon': 1}) 
>>>

由于两个2.7和3.1仍处于测试阶段它的ü可能你正在使用它，所以请记住，做这类工作的标准方式很快就可以使用。

来源

2009-05-21 15:16:59 sykora

没有defaultdict：

words = "apple banana apple strawberry banana lemon" 
my_count = {} 
for word in words.split(): 
    try: my_count[word] += 1 
    except KeyError: my_count[word] = 1

来源

2009-05-21 15:59:30

freqs = {} 
for word in words: 
    freqs[word] = freqs.get(word, 0) + 1 # fetch and increment OR initialize

我觉得这个结果一样三联的解决方案，但不导入集合。也有点像Selinap的解决方案，但更易读的imho。与Thomas Weigel的解决方案几乎完全相同，但没有使用例外。

但是，这可能比使用集合库中的defaultdict（）慢。由于该值被提取，增加，然后再次分配。而不是增加。但是使用+ =可能会在内部执行相同的操作。

来源

2009-06-11 20:21:44 hopla

难道你不能只使用计数？

words = 'the quick brown fox jumps over the lazy gray dog' 
words.count('z') 
#output: 1

来源

2011-04-07 05:36:08 Antonio

答案下面需要一些额外的周期，但它是另一种方法

def func(tup): 
    return tup[-1] 


def print_words(filename): 
    f = open("small.txt",'r') 
    whole_content = (f.read()).lower() 
    print whole_content 
    list_content = whole_content.split() 
    dict = {} 
    for one_word in list_content: 
     dict[one_word] = 0 
    for one_word in list_content: 
     dict[one_word] += 1 
    print dict.items() 
    print sorted(dict.items(),key=func)

来源

2013-02-27 02:17:20

我碰巧在一些星火锻炼工作，这里是我的解决方案。

tokens = ['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog'] 

print {n: float(tokens.count(n))/float(len(tokens)) for n in tokens}

**以上**

{'brown': 0.16666666666666666, 'lazy': 0.16666666666666666, 'jumps': 0.16666666666666666, 'fox': 0.16666666666666666, 'dog': 0.16666666666666666, 'quick': 0.16666666666666666}

来源

2015-06-26 06:02:07 javaidiot

使用减少的＃（）输出到列表转换为单个字典。

words = "apple banana apple strawberry banana lemon" 
reduce(lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

回报

{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}

来源

2016-02-23 18:03:45 Gadi

words = "apple banana apple strawberry banana lemon" 
w=words.split() 
e=list(set(w))  
for i in e: 
    print(w.count(i)) #Prints frequency of every word in the list

希望这有助于！

来源

2017-11-12 16:17:28

蟒蛇项目频率计数

回答

相关问题