2011-10-01 34 views
11

我正在使用NLTK RegexpParser从标记的标记中提取noungroups和verbgroups。NLTK分块和散步结果树

如何遍历结果树才能找到NP或V组的块?

from nltk.chunk import RegexpParser 

grammar = ''' 
NP: {<DT>?<JJ>*<NN>*} 
V: {<V.*>}''' 
chunker = RegexpParser(grammar) 
token = [] ## Some tokens from my POS tagger 
chunked = chunker.parse(tokens) 
print chunked 

#How do I walk the tree? 
#for chunk in chunked: 
# if chunk.??? == 'NP': 
#   print chunk 

(S (NP载波/ NN) 为/ IN 组织 -/JJ 和/ CC 细胞培养/ JJ 为/ IN (NP的/ DT制剂/ NN) 的/ IN (NP植入物/ NNS) 和/ CC (NP植入物/ NN) (含有/ VBG V) (NP的/ DT载体/ NN) ./。)

回答

11

这应该工作:

for n in chunked: 
    if isinstance(n, nltk.tree.Tree):    
     if n.label() == 'NP': 
      do_something_with_subtree(n) 
     else: 
      do_something_with_leaf(n) 
+0

给我 AttributeError的: '元组' 对象有没有属性 '节点' n是<类型 '元组'> –

+0

编辑答案... –

+1

就像一个魅力的 - 谢谢! –

0

小的失误在token

from nltk.chunk import RegexpParser 
grammar = ''' 
NP: {<DT>?<JJ>*<NN>*} 
V: {<V.*>}''' 
chunker = RegexpParser(grammar) 
token = [] ## Some tokens from my POS tagger 
//chunked = chunker.parse(tokens) // token defined in the previous line but used tokens in chunker.parse(tokens) 
chunked = chunker.parse(token) // Change in this line 
print chunked 
0

萨维诺的答案是伟大的,但它也是值得注意的是,子树可以通过索引访问为好,例如

for n in range(len(chunked)): 
    do_something_with_subtree(chunked[n])