2015-02-24 75 views
0

我想从树结构中得到像以下给出的扁平树。如何平分解析树并存储在一个字符串中进一步的字符串操作python nltk

parse tree

我要像没有坏树检测到的错误来获得这整个树的字符串:

((S (NP-SBJ (NP (DT The) (JJ high) (JJ seven-day))(PP (IN of) (NP (DT the) (CD 400) (NNS money))))(VP (VBD was) (NP-PRD (CD 8.12) (NN %))(, ,) (ADVP (RB down) (PP (IN from) (NP (CD 8.14) (NN %)))))(. .))) 
+1

为什么你想这样做?这只是使它很难处理。树木很容易,并提供很多结构,你可以从文本中重新发明。 – 2015-03-15 09:29:57

回答

2

documentation提供了pprint()方法,以展树成一行。

解析这句话:

string = "My name is Ross and I am cool. What's going on world? I'm looking for friends." 

,然后调用pprint()产生如下:

u"(NP+SBAR+S\n (S\n (NP (PRP$ my) (NN name))\n (VP\n  (VBZ is)\n  (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.))\n  (SBAR\n  (WHNP (WP What))\n  (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world)))))\n (. ?))\n (S\n (NP (PRP I))\n (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends))))\n (. .)))" 

从这一点来说,如果你想删除的标签和换行,你可以使用下面的splitjoin(see here)

splitted = tree.pprint().split() 
flat_tree = ' '.join(splitted) 

执行该得到这对我来说:

u"(NP+SBAR+S (S (NP (PRP$ my) (NN name)) (VP (VBZ is) (NP (NNP Ross) (CC and) (PRP I) (JJ am) (NN cool.)) (SBAR (WHNP (WP What)) (S+VP (VBZ 's) (VBG going) (NP (IN on) (NN world))))) (. ?)) (S (NP (PRP I)) (VP (VBP 'm) (VBG looking) (PP (IN for) (NP (NNS friends)))) (. .)))" 
1

的Python NLTK的树的操作和节点抽出

from nltk.tree import Tree 
for tr in trees: 
    tr1 = str(tr) 
    s1 = Tree.fromstring(tr1) 
    s2 = s1.productions() 
1

可以使用STR功能再拆,并加入如​​按照树转换为字符串提供了一个功能:

parse_string = ' '.join(str(tree).split()) 

print parse_string 
相关问题