2011-05-08 114 views
0
s=array1 #user inputs an array with text in it 
n=len(s) 
f=arange(0,26,1) 
import collections 
dict = collections.defaultdict(int) 
for c in s: 
    dict[c] += 1 

for c in f: 
    print c,dict[c]/float(n) 

在输出中,c是数字而不是字母,我不知道如何将其转换回字母。Python中的频率分析 - 使用频率而不是数字打印字母

此外,是否有任何方法将频率/字母放入数组,以便可以将它们绘制在直方图中?

+0

什么是IntArrayToText调用?它是一个字符串吗? – 2011-05-08 03:48:52

回答

1

要将一个数转换为它所代表的字母,只需使用内置chr

>>> chr(98) 
'b' 
>>> chr(66) 
'B' 
>>> 
4

应该指出的是,你是不是叫map用正确类型的参数(因此TypeError)。它需要一个函数和一个或多个迭代器,函数将应用于该函数。你的第二个参数是toChar [i],这将是一个字符串。所有迭代实现__iter__。为了说明:

>>> l, t = [],() 
>>> l.__iter__ 
<<< <method-wrapper '__iter__' of list object at 0x7ebcd6ac> 
>>> t.__iter__ 
<<< <method-wrapper '__iter__' of tuple object at 0x7ef6102c> 

DTing's answer提醒我的collections.Counter

>>> from collections import Counter 
>>> a = 'asdfbasdfezadfweradf' 
>>> dict((k, float(v)/len(a)) for k,v in Counter(a).most_common()) 
<<< 
{'a': 0.2, 
'b': 0.05, 
'd': 0.2, 
'e': 0.1, 
'f': 0.2, 
'r': 0.05, 
's': 0.1, 
'w': 0.05, 
'z': 0.05} 
+0

+1我从来没有使用过,谢谢! =) – DTing 2011-05-08 05:21:50

1
>>> a = "asdfbasdfezadfweradf" 
>>> import collections 
>>> counts = collections.defaultdict(int) 
>>> for letter in a: 
...  counts[letter]+=1 
... 
>>> print counts 
defaultdict(<type 'int'>, {'a': 4, 'b': 1, 'e': 2, 'd': 4, 'f': 4, 's': 2, 'r': 1, 'w': 1, 'z': 1}) 
>>> hist = dict((k, float(v)/len(a)) for k,v in counts.iteritems()) 
>>> print hist 
{'a': 0.2, 'b': 0.05, 'e': 0.1, 'd': 0.2, 'f': 0.2, 's': 0.1, 'r': 0.05, 'w': 0.05, 'z': 0.05} 
+1

不错!让我想起'collections.Counter'。 – zeekay 2011-05-08 05:03:05

0

到频率/字母转换成数组:

hisArray = [dict[c]/float(n) for c in f] 
3

如果您正在使用Python 2.7或更高您可以使用collections.Counter

的Python 2.7+

>>> import collections 
>>> s = "I want to count frequencies." 
>>> counter = collections.Counter(s) 
>>> counter 
Counter({' ': 4, 'e': 3, 'n': 3, 't': 3, 'c': 2, 'o': 2, 'u': 2, 'a': 1, 'f': 1, 'I': 1,  'q': 1, 'i': 1, 's': 1, 'r': 1, 'w': 1, '.': 1}) 
>>> n = sum(counter.values()) * 1.0 # Convert to float so division returns float. 
>>> n 
28 
>>> [(char, count/n) for char, count in counter.most_common()] 
[(' ', 0.14285714285714285), ('e', 0.10714285714285714), ('n', 0.10714285714285714), ('t', 0.10714285714285714), ('c', 0.07142857142857142), ('o', 0.07142857142857142), ('u', 0.07142857142857142), ('a', 0.03571428571428571), ('f', 0.03571428571428571), ('I', 0.03571428571428571), ('q', 0.03571428571428571), ('i', 0.03571428571428571), ('s', 0.03571428571428571), ('r', 0.03571428571428571), ('w', 0.03571428571428571), ('.', 0.03571428571428571)] 

的Python 3+

>>> import collections 
>>> s = "I want to count frequencies." 
>>> counter = collections.Counter(s) 
>>> counter 
Counter({' ': 4, 'e': 3, 'n': 3, 't': 3, 'c': 2, 'o': 2, 'u': 2, 'a': 1, 'f': 1, 'I': 1,  'q': 1, 'i': 1, 's': 1, 'r': 1, 'w': 1, '.': 1}) 
>>> n = sum(counter.values()) 
>>> n 
28 
>>> [(char, count/n) for char, count in counter.most_common()] 
[(' ', 0.14285714285714285), ('e', 0.10714285714285714), ('n', 0.10714285714285714), ('t', 0.10714285714285714), ('c', 0.07142857142857142), ('o', 0.07142857142857142), ('u', 0.07142857142857142), ('a', 0.03571428571428571), ('f', 0.03571428571428571), ('I', 0.03571428571428571), ('q', 0.03571428571428571), ('i', 0.03571428571428571), ('s', 0.03571428571428571), ('r', 0.03571428571428571), ('w', 0.03571428571428571), ('.', 0.03571428571428571)] 

这也将在按频率的降序返回(炭,频率)元组。