2017-02-25 99 views
0

这里的错误消息:字典大小变化,为什么

RuntimeError: dictionary changed size during iteration

这里是我的代码段(< =标志着错误行):

# Probability Distribution from a sequence of tuple tokens 
def probdist_from_tokens (tokens, N, V = 0, addone = False): 
    cfd = ConditionalFreqDist (tokens) 
    pdist = {} 

    for a in cfd: # <= line with the error 
     pdist[a] = {} 
     S = 1 + sum (1 for b in cfd[a] if cfd[a][b] == 1) 
     A = sum (cfd[a][b] for b in cfd[a]) 

     # Add the log probs. 
     for b in cfd[a]: 
      B = sum (cfd[b][c] for c in cfd[b]) 
      boff = ((B + 1)/(N + V)) if addone else (B/N) 
      pdist[a][b] = math.log ((cfd[a][b] + (S * boff))/(A + S)) 

     # Add OOV for tag if relevant 
     if addone: 
      boff = 1/(N + V) 
      pdist[a]["<OOV>"] = math.log ((S * boff)/(A + S)) 

    return pdist 

我基本上只是使用cfd作为参考,将正确的值放在pdist中。我不想改变cfd,我只是想遍历它的键和它的子字典的键。

我认为问题是由设置变量A和B的行所引起的,当我在这些行上使用不同的代码时遇到了同样的错误,但是当我用常量替换它们时没有得到错误值。

+0

您能提供一个独立的示例来演示问题吗? – BrenBarn

回答

1

nltk.probability.ConditionalFreqDist继承defaultdict,这意味着如果你读一个不存在的条目cfd[b],一个新的条目(b, FreqDist())将被插入到字典中,从而改变它的大小。问题的演示:

import collections 
d = collections.defaultdict(int, {'a': 1}) 
for k in d: 
    print(d['b']) 

输出:

0 
Traceback (most recent call last): 
    File "1.py", line 4, in <module> 
    for k in d: 
RuntimeError: dictionary changed size during iteration 

所以,你应该检查这行:

for b in cfd[a]: 
     B = sum (cfd[b][c] for c in cfd[b]) 

你确定的b关键确实存在于cfd?您可能需要将其更改为

 B = sum(cfd[b].values()) if b in cfd else 0 
#        ^~~~~~~~~~~