2015-04-01 111 views
-3

我正在使用nltk的树数据结构。下面是示例nltk.Tree。从Python树中提取父节点和子节点

(S 
    (S 
    (ADVP (RB recently)) 
    (NP (NN someone)) 
    (VP 
     (VBD mentioned) 
     (NP (DT the) (NN word) (NN malaria)) 
     (PP (TO to) (NP (PRP me))))) 
    (, ,) 
    (CC and) 
    (IN so) 
    (S 
    (NP 
     (NP (CD one) (JJ whole) (NN flood)) 
     (PP (IN of) (NP (NNS memories)))) 
    (VP (VBD came) (S (VP (VBG pouring) (ADVP (RB back)))))) 
    (. .)) 

我不知道nltk.Tree数据结构。我想为每个叶节点提取父节点和超级父节点,例如对于'最近'我想要(ADVP,RB),对于'某人'它是(NP,NN)这是我想要的最终结果。更早的答案使用eval()函数来做到这一点,我想避免。对于相同的

[('ADVP', 'RB'), ('NP', 'NN'), ('VP', 'VBD'), ('NP', 'DT'), ('NP', 'NN'), ('NP', 'NN'), ('PP', 'TO'), ('NP', 'PRP'), ('S', 'CC'), ('S', 'IN'), ('NP', 'CD'), ('NP', 'JJ'), ('NP', 'NN'), ('PP', 'IN'), ('NP', 'NNS'), ('VP', 'VBD'), ('VP', 'VBG'), ('ADVP', 'RB')] 
+0

可能重复http://stackoverflow.com/问题/ 29247241/extract-parent-and-child-node-from-python-tree-representation) – leekaiinthesky 2015-04-06 02:43:36

+0

@leekaiinthesky该代码使用eval()函数,导致堆栈满错误。但是,我已经通过使用nltk树数据结构。我在下面发布我的答案。 – rombi 2015-04-06 14:50:29

回答

0

Python代码,而无需使用eval函数,并使用NLTK树数据结构

sentences = " (S 
    (S 
(ADVP (RB recently)) 
(NP (NN someone)) 
(VP 
    (VBD mentioned) 
    (NP (DT the) (NN word) (NN malaria)) 
    (PP (TO to) (NP (PRP me))))) 
    (, ,) 
    (CC and) 
    (IN so) 
    (S 
    (NP 
     (NP (CD one) (JJ whole) (NN flood)) 
     (PP (IN of) (NP (NNS memories)))) 
    (VP (VBD came) (S (VP (VBG pouring) (ADVP (RB back)))))) 
    (. .))" 

print list(tails(sentences)) 


def tails(items, path=()): 
for child in items: 
    if type(child) is nltk.Tree: 
     if child.label() in {".", ","}: # ignore punctuation 
      continue 
     for result in tails(child, path + (child.label(),)): 
      yield result 
    else: 
     yield path[-2:] 
[提取父和子节点从蟒树表示](的