Python中的频率分析 - 使用频率而不是数字打印字母

s=array1 #user inputs an array with text in it 
n=len(s) 
f=arange(0,26,1) 
import collections 
dict = collections.defaultdict(int) 
for c in s: 
    dict[c] += 1 

for c in f: 
    print c,dict[c]/float(n)

在输出中，c是数字而不是字母，我不知道如何将其转换回字母。Python中的频率分析 - 使用频率而不是数字打印字母

此外，是否有任何方法将频率/字母放入数组，以便可以将它们绘制在直方图中？

来源

2011-05-08 PythonAlex

什么是IntArrayToText调用？它是一个字符串吗？ – 2011-05-08 03:48:52

要将一个数转换为它所代表的字母，只需使用内置chr：

>>> chr(98) 
'b' 
>>> chr(66) 
'B' 
>>>

来源

2011-05-08 03:42:22

应该指出的是，你是不是叫map用正确类型的参数（因此TypeError）。它需要一个函数和一个或多个迭代器，函数将应用于该函数。你的第二个参数是toChar [i]，这将是一个字符串。所有迭代实现__iter__。为了说明：

>>> l, t = [],() 
>>> l.__iter__ 
<<< <method-wrapper '__iter__' of list object at 0x7ebcd6ac> 
>>> t.__iter__ 
<<< <method-wrapper '__iter__' of tuple object at 0x7ef6102c>

DTing's answer提醒我的collections.Counter：

>>> from collections import Counter 
>>> a = 'asdfbasdfezadfweradf' 
>>> dict((k, float(v)/len(a)) for k,v in Counter(a).most_common()) 
<<< 
{'a': 0.2, 
'b': 0.05, 
'd': 0.2, 
'e': 0.1, 
'f': 0.2, 
'r': 0.05, 
's': 0.1, 
'w': 0.05, 
'z': 0.05}

来源

2011-05-08 03:53:08 zeekay

+1我从来没有使用过，谢谢！ =） – DTing 2011-05-08 05:21:50

>>> a = "asdfbasdfezadfweradf" 
>>> import collections 
>>> counts = collections.defaultdict(int) 
>>> for letter in a: 
...  counts[letter]+=1 
... 
>>> print counts 
defaultdict(<type 'int'>, {'a': 4, 'b': 1, 'e': 2, 'd': 4, 'f': 4, 's': 2, 'r': 1, 'w': 1, 'z': 1}) 
>>> hist = dict((k, float(v)/len(a)) for k,v in counts.iteritems()) 
>>> print hist 
{'a': 0.2, 'b': 0.05, 'e': 0.1, 'd': 0.2, 'f': 0.2, 's': 0.1, 'r': 0.05, 'w': 0.05, 'z': 0.05}

来源

2011-05-08 04:33:39 DTing

不错！让我想起'collections.Counter'。 – zeekay 2011-05-08 05:03:05

到频率/字母转换成数组：

hisArray = [dict[c]/float(n) for c in f]

来源

2011-05-08 04:35:45

如果您正在使用Python 2.7或更高您可以使用collections.Counter。

的Python 2.7+

>>> import collections 
>>> s = "I want to count frequencies." 
>>> counter = collections.Counter(s) 
>>> counter 
Counter({' ': 4, 'e': 3, 'n': 3, 't': 3, 'c': 2, 'o': 2, 'u': 2, 'a': 1, 'f': 1, 'I': 1,  'q': 1, 'i': 1, 's': 1, 'r': 1, 'w': 1, '.': 1}) 
>>> n = sum(counter.values()) * 1.0 # Convert to float so division returns float. 
>>> n 
28 
>>> [(char, count/n) for char, count in counter.most_common()] 
[(' ', 0.14285714285714285), ('e', 0.10714285714285714), ('n', 0.10714285714285714), ('t', 0.10714285714285714), ('c', 0.07142857142857142), ('o', 0.07142857142857142), ('u', 0.07142857142857142), ('a', 0.03571428571428571), ('f', 0.03571428571428571), ('I', 0.03571428571428571), ('q', 0.03571428571428571), ('i', 0.03571428571428571), ('s', 0.03571428571428571), ('r', 0.03571428571428571), ('w', 0.03571428571428571), ('.', 0.03571428571428571)]

的Python 3+

>>> import collections 
>>> s = "I want to count frequencies." 
>>> counter = collections.Counter(s) 
>>> counter 
Counter({' ': 4, 'e': 3, 'n': 3, 't': 3, 'c': 2, 'o': 2, 'u': 2, 'a': 1, 'f': 1, 'I': 1,  'q': 1, 'i': 1, 's': 1, 'r': 1, 'w': 1, '.': 1}) 
>>> n = sum(counter.values()) 
>>> n 
28 
>>> [(char, count/n) for char, count in counter.most_common()] 
[(' ', 0.14285714285714285), ('e', 0.10714285714285714), ('n', 0.10714285714285714), ('t', 0.10714285714285714), ('c', 0.07142857142857142), ('o', 0.07142857142857142), ('u', 0.07142857142857142), ('a', 0.03571428571428571), ('f', 0.03571428571428571), ('I', 0.03571428571428571), ('q', 0.03571428571428571), ('i', 0.03571428571428571), ('s', 0.03571428571428571), ('r', 0.03571428571428571), ('w', 0.03571428571428571), ('.', 0.03571428571428571)]

这也将在按频率的降序返回（炭，频率）元组。

来源

2011-05-08 05:08:00

Python中的频率分析 - 使用频率而不是数字打印字母

回答

相关问题