2017-03-16 149 views
6

解析树我有一个句子约翰看到在商店华而不实的帽子
如下图所示如何表示这是一个依赖关系树?依赖于Spacy

(S 
     (NP (NNP John)) 
     (VP 
     (VBD saw) 
     (NP (DT a) (JJ flashy) (NN hat)) 
     (PP (IN at) (NP (DT the) (NN store))))) 

我从here

import spacy 
from nltk import Tree 
en_nlp = spacy.load('en') 

doc = en_nlp("John saw a flashy hat at the store") 

def to_nltk_tree(node): 
    if node.n_lefts + node.n_rights > 0: 
     return Tree(node.orth_, [to_nltk_tree(child) for child in node.children]) 
    else: 
     return node.orth_ 


[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents] 

我正在以下,但我找一棵树(NLTK)格式得到这个脚本。

 saw     
    ____|_______________  
|  |   at 
|  |   | 
|  hat  store 
|  ___|____  | 
John a  flashy the 

回答

3

文本表述之外,你想达到什么是获得选区树了依赖图。你想要的输出的例子是一个经典的选区树(如在短语结构语法中,与依赖语法相反)。

虽然从选区树到依赖图的转换或多或少都是自动化任务(例如,http://www.mathcs.emory.edu/~choi/doc/clear-dependency-2012.pdf),但其他方向却不是。已经有这方面的工作,检查PAD项目https://github.com/ikekonglp/PAD和描述基础算法的文章:http://homes.cs.washington.edu/~nasmith/papers/kong+rush+smith.naacl15.pdf

你也可能要重新考虑,如果你真的需要一个选区解析,这里是一个很好的理由:https://linguistics.stackexchange.com/questions/7280/why-is-constituency-needed-since-dependency-gets-the-job-done-more-easily-and-e

3

要重新创建SpaCy依赖解析一个NLTK风格的树,请尝试使用draw方法从nltk.tree而不是pretty_print

import spacy 
from nltk.tree import Tree 

spacy_nlp = spacy.load("en") 

def nltk_spacy_tree(sent): 
    """ 
    Visualize the SpaCy dependency tree with nltk.tree 
    """ 
    doc = spacy_nlp(sent) 
    def token_format(token): 
     return "_".join([token.orth_, token.tag_, token.dep_]) 

    def to_nltk_tree(node): 
     if node.n_lefts + node.n_rights > 0: 
      return Tree(token_format(node), 
         [to_nltk_tree(child) 
         for child in node.children] 
        ) 
     else: 
      return token_format(node) 

    tree = [to_nltk_tree(sent.root) for sent in doc.sents] 
    # The first item in the list is the full tree 
    tree[0].draw() 

注意,因为只有SpaCy目前支持依存分析,并在字和名词短语级标记,SpaCy树木不会像深深结构为那些你从得到的,对于实例,斯坦福解析器,你可以所以想象成一棵树:

from nltk.tree import Tree 
from nltk.parse.stanford import StanfordParser 

# Note: Download Stanford jar dependencies first 
# See https://stackoverflow.com/questions/13883277/stanford-parser-and-nltk 
stanford_parser = StanfordParser(
    model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz" 
) 

def nltk_stanford_tree(sent): 
    """ 
    Visualize the Stanford dependency tree with nltk.tree 
    """ 
    parse = stanford_parser.raw_parse(sent) 
    tree = list(parse) 
    # The first item in the list is the full tree 
    tree[0].draw() 

现在,如果我们同时运行,nltk_spacy_tree("John saw a flashy hat at the store.")会产生this imagenltk_stanford_tree("John saw a flashy hat at the store.")会产生this one