2011-12-22 72 views
1

我用下面的代码获取的信件频率在文本:如何将累积输出存储在列表中?

for s in 'abcdefghijklmnopqrstuvwxyz ': 
    count = 0 
    for char in rawpunct.lower(): 
     if s == char: 
      count +=1 
    result = s, '%.3f' % (count*100/len(rawpunct.lower())) 
    f_list.append(result) 

,其结果是:

['0.061', '0.012', '0.017', '0.030', '0.093', '0.016', '0.016', 
'0.049', '0.050', '0.001', '0.006', '0.034', '0.018', '0.052', '0.055', 
'0.013', '0.001', '0.041', '0.050', '0.069', '0.021', '0.007', '0.017', 
'0.001', '0.013', '0.000', '0.159'] 

,但我想存储的累积频率,即创建这个列表:

['0.061', '0.073', '0.100', '0.130' ............ ] 

任何人都知道该怎么做?

+1

这不是你问的问题;但是请注意,这可以通读整个文本27次,只要通过只读一遍即可获得相同的结果。简单地创建一个将字符映射到如下所示的字典:对于'abcdefghijklmnopqrstuvwxyz'中的'counts = {a = 0,b:0 ...',或者等同于'counts = dict((c,0))通过文本一次;对于文本中的每个“c”,执行这个计数[c] + = 1',然后最后可以使用下面描述的方法创建一个新的累积列表 – senderle 2011-12-22 15:53:54

+0

也可用于像这样操作:['defaultdict'](http://docs.python.org/library/collections.html#collections.defaultdict)和['Counter'](http://docs.python.org/library/collections的.html#collections.Counter)。 – senderle 2011-12-22 15:54:04

回答

2

只为一个班轮的乐趣:

original = ['0.061', '0.012', '0.017', '0.030', '0.093', '0.016', '0.016', 
'0.049', '0.050', '0.001', '0.006', '0.034', '0.018', '0.052', '0.055', 
'0.013', '0.001', '0.041', '0.050', '0.069', '0.021', '0.007', '0.017', 
'0.001', '0.013', '0.000', '0.159'] 

result = [sum(float(item) for item in original[0:rank+1]) for rank in xrange(len(original))] 

>>> [0.061, 0.073, 0.09, 0.12, 0.213, 0.22899999999999998, 0.245, 0.294, 0.344, 0.345, 0.351, 0.385, 0.403, 0.455, 0.51, 0.523, 0.524, 0.5650000000000001, 0.6150000000000001, 0.6840000000000002, 0.7050000000000002, 0.7120000000000002, 0.7290000000000002, 0.7300000000000002, 0.7430000000000002, 0.7430000000000002, 0.9020000000000002] 
1
if len(f_list) == 0: 
    f_list.append(result) 
else: 
    f_list.append(f_list[-1] + result) 
1
f_list = [0] 
for s in 'abcdefghijklmnopqrstuvwxyz ': 
    count = 0 
    for char in rawpunct.lower(): 
     if s == char: 
      count +=1 
    result = s, '%.3f' % (count*100/len(rawpunct.lower())) 
    f_list.append(result + f_list[-1]) 
f_list = list(f_list[1:]) 
2
letters = 'abcdefghijklmnopqrstuvwxyz ' 
counts = dict.fromkeys(letters, 0) 
for char in rawpunct.lower(): 
    try: 
     counts[char] += 1 
    except KeyError: 
     pass 
     # this character in rawpunct should not be counted! 
f_list = [0] 
for s in letters: 
    f_list.append(f_list[-1] + counts[s]) 
str_list = ['{0:.3f}'.format(f) for f in f_list[1:]] 

f_list是浮动的列表(这是比较容易计算与彩车比用字符串表示的款项!)。最后,我创建了str_list,这是这些浮点数的字符串表示列表。既然你不想用零开始你的列表,这将在最后被删除(只有f_list[1:]被采用)。

如果您的输入文本很长,此解决方案速度更快,因为它只读取一次。

3

您可以使用import numpy ,然后作出导致数组results=numpy.array(result) 终于 'f_list=numpy.cumsum(results)'

0

cumsum版,采用reduce

In [1]: x = [1,2,3] 
In [2]: reduce(lambda acc, x: acc + [acc[-1] + x], x[1:], x[:1]) 
Out[2]: [1, 3, 6] 

它适用于空手道y列表:

In [3]: x = [] 
In [4]: reduce(lambda acc, x: acc + [acc[-1] + x], x[1:], x[:1]) 
Out[4]: [] 
0

我想rawpunct是包含你的文本的字符串。我用我的建议中的文字替换它:

from string import lowercase 

text='Some arbitrary Text with NonNSense! @#!.+-'.lower() 
chmap = lowercase+' ' 
cooked_text = ''.join([i for i in text if i in chmap]) 
chdict = dict.fromkeys(chmap, 0)  #set totals-dict up 
frequencies = dict.fromkeys(chmap, 0) #set fractions dict up 

for ch in cooked_text: #toals per char 
    chdict[ch] += 1 

for char in chdict.keys(): #relative to text-length 
    frequencies[char] = float(chdict[char])/len(cooked_text) 

frequency_list = [frequencies[char] for char in chmap] 
frequency_strlist = ['%.3f' % f for f in frequency_list] 
print frequency_strlist