从输出解析树中提取文本

我是nlp的新手，我试图使用斯坦福解析器从文本中提取（NP）句子，我想检索文本中标记部分（NP）从输出解析树中提取文本

如果一个零件被标记（NP）并且其中的一个较小零件也被标记（NP）我想采取较小的部分。

到现在我能够做到什么，我想在下面的方法：

private static ArrayList<Tree> extract(Tree t) 
{ 
    ArrayList<Tree> wanted = new ArrayList<Tree>(); 
    if (t.label().value().equals("NP")) 
    { 
     wanted.add(t); 
     for (Tree child : t.children()) 
     { 
      ArrayList<Tree> temp = new ArrayList<Tree>(); 
      temp=extract(child); 
      if(temp.size()>0) 
      { 
       int o=-1; 
       o=wanted.indexOf(t); 
       if(o!=-1) 
        wanted.remove(o); 
      } 
      wanted.addAll(temp); 
     } 
    } 

    else 
     for (Tree child : t.children()) 
      wanted.addAll(extract(child)); 
    return wanted; 
}

这个方法的返回值类型是树木列表，当我做到以下几点：

 LexicalizedParser parser = LexicalizedParser.loadModel(); 
     x = parser.apply("Who owns club barcelona?"); 
    outs=extract(x); 
    for(int i=0;i<outs.size();i++){System.out.println("tree #"+i+": "+outs.get(i));}

是：

tree #0: (NP (NN club) (NN barcelona))

我想要的输出为"club barcelona"向右走，没有标签，我TRIE d的.labels();财产和.label().value();它们返回的标签，而不是

来源

2012-09-20 SKandeel

你可以得到的单词列表的子树TR下与

tr.yield()

可以进行转换的只是句子方便的方法字符串形式：

Sentence.listToString(tr.yield())

您只需步行为你做一棵树，但如果你要多少做这种事情，你可能想看看tregex这使得它更容易找到在树上的特定节点通过声明模式，suc h作为没有NP的NP。一个干净的方式来做你正在寻找的是这样的：

Tree x = lp.apply("Christopher Manning owns club barcelona?"); 
TregexPattern NPpattern = TregexPattern.compile("@NP !<< @NP"); 
TregexMatcher matcher = NPpattern.matcher(x); 
while (matcher.findNextMatchingNode()) { 
    Tree match = matcher.getMatch(); 
    System.out.println(Sentence.listToString(match.yield())); 
}

来源

2012-09-21 05:37:13

从输出解析树中提取文本

回答

相关问题