愚蠢退避中的折扣值

我遵循NLP教程here（6'58''） - 关于愚蠢退避平滑算法的部分。在视频教程和implementation of bi-gram level stupid-backoff，他们使用的折扣值= 0.4愚蠢退避中的折扣值

实现两字级的退避：

def score(self, sentence): 
    score = 0.0 
    previous = sentence[0] 
    for token in sentence[1:]: 
     bicount = self.bigramCounts[(previous, token)] 
     bi_unicount = self.unigramCounts[previous] 
     unicount = self.unigramCounts[token] 
     if bicount > 0: 
      score += math.log(bicount) 
      score -= math.log(bi_unicount) 
     else: 
      score += math.log(0.4)  // discount here 
      score += math.log(unicount + 1) 
      score -= math.log(self.total + self.vocab_size) 
     previous = token 
    return score

但随后trigram-level implementation，贴现值是1

def score(self, sentence): 
    score = 0.0 
    fst = sentence[0] 
    snd = sentence[1] 
    for token in sentence[2:]: 
     tricount = self.trigramCounts[(fst, snd, token)] 
     tri_bicount = self.bigramCounts[(fst, snd)] 
     bicount = self.bigramCounts[(snd, token)] 
     bi_unicount = self.unigramCounts[snd] 
     unicount = self.unigramCounts[token] 
     if tricount > 0: 
      score += math.log(tricount) 
      score -= math.log(tri_bicount) 
     elif bicount > 0: 
      score += math.log(bicount)    // no discount here 
      score -= math.log(bi_unicount) 
     else: 
      score += math.log((unicount + 1))  // no discount here 
      score -= math.log(self.total + self.vocab_size) 
     fst, snd = snd, token 
    return score

当我跑project - 与折扣设置0.4和1的三克的水平，我得到的分数：

tri-gram with discount = 0.4 < bi-gram with discount = 0.4 < tri-gram with discount =1

这很容易知道为什么 - 有折扣= 0.4，成为三克的最终else：

else: 
    score += math.log(0.4)  // -> -0.3979 
    score += math.log(0.4)  // -> -0.3979 
    score += math.log((unicount + 1))  // no discount here 
    score -= math.log(self.total + self.vocab_size)

所以我真的很困惑 - 0.4值是从哪里来的？

来源

2016-03-05 user3448806

0.4在愚蠢的退避？ – user3639557

@ user3639557是的，但我不知道为什么它是0.4，为什么在trigram例子中，他们不使用这个折扣。 – user3448806

这是非常随意的，这就是为什么他们把它称为愚蠢回退。阅读以下答案中引用的论文。 – user3639557

看看谷歌paper提出愚蠢的退避。

来源

2016-03-05 09:17:00 user3639557

愚蠢退避中的折扣值

回答

相关问题