2016-08-02 88 views
3

我正在使用从python 2.7的nltk树包,我想从它的祖父节点的树中提取每个规则。 我有以下的树使用nltk找到祖父节点的节点

t = Tree('S', [Tree('NP', [Tree('D', ['the']), Tree('N', ['dog'])]), Tree('VP', [Tree('V', ['chased']), Tree('NP', [Tree('D', ['the']), Tree('N', ['cat'])])])]) 

和树的作品

t.productions 
    [S -> NP VP, NP -> D N, D -> 'the', N -> 'dog', VP -> V NP, V -> 'chased', NP -> D N, D -> 'the', N -> 'cat'] 

   S    
     ________|_____   
     |    VP   
     |   _____|___  
     NP  |   NP  
    ___|___  |  ___|___ 
    D  N V  D  N 
    |  | |  |  | 
the  dog chased the  cat 

我要的是什么形式的:

[S -> NP VP, S^NP -> D N, NP^D -> 'the', NP^N -> 'dog'.......] 

我看过一个t ParelaysTree类,但我没有得到如何使用它来解决我的问题。

回答

1

您需要修改/覆盖制作方法

代码:

from nltk.tree import Tree 
from nltk.compat import string_types 
from nltk.grammar import Production, Nonterminal 
from nltk.tree import _child_names 

def productions(t, parent): 
    if not isinstance(t._label, string_types): 
     raise TypeError('Productions can only be generated from trees having node labels that are strings') 

    # t._label ==> parent + "^" + t._label 
    prods = [Production(Nonterminal(parent + "^" + t._label), _child_names(t))] 
    for child in t: 
     if isinstance(child, Tree): 
      prods += productions(child, t._label) 
    return prods 


t = Tree('S', [Tree('NP', [Tree('D', ['the']), Tree('N', ['dog'])]), Tree('VP', [Tree('V', ['chased']), Tree('NP', [Tree('D', ['the']), Tree('N', ['cat'])])])]) 

# To Add Parent of 'S' as 'Start' 
# prods = productions(t, "Start") 

# To Skip Parent of 'S' 
prods = [Production(Nonterminal(t._label), _child_names(t))] 
for child in t: 
    if isinstance(child, Tree): 
     prods += productions(child, t._label) 

print prods 

输出:

[S -> NP VP, S^NP -> D N, NP^D -> 'the', 
    NP^N -> 'dog', S^VP -> V NP, VP^V -> 'chased', 
    VP^NP -> D N, NP^D -> 'the', NP^N -> 'cat'] 

有关详细信息检查nltk.treeproductions方法 - here