CoreNLP斯坦福依赖格式

上的端口和移民法案提交由参议员布朗巴克，堪萨斯CoreNLP斯坦福依赖格式

的共和党从上面的句子，我期待得到以下类型的依赖关系：

nsubjpass(submitted, Bills) 
auxpass(submitted, were) 
agent(submitted, Brownback) 
nn(Brownback, Senator) 
appos(Brownback, Republican) 
prep_of(Republican, Kansas) 
prep_on(Bills, ports) 
conj_and(ports, immigration) 
prep_on(Bills, immigration)

这应该是可能根据表1，图1的文件Stanford Dependencies。

使用下面的代码，我只能够达到以下依赖化妆（代码输出，这一点）：

root(ROOT-0, submitted-7) 
nmod:on(Bills-1, ports-3) 
nmod:on(Bills-1, immigration-5) 
case(ports-3, on-2) 
cc(ports-3, and-4) 
conj:and(ports-3, immigration-5) 
nsubjpass(submitted-7, Bills-1) 
auxpass(submitted-7, were-6) 
nmod:agent(submitted-7, Brownback-10) 
case(Brownback-10, by-8) 
compound(Brownback-10, Senator-9) 
punct(Brownback-10, ,-11) 
appos(Brownback-10, Republican-12) 
nmod:of(Republican-12, Kansas-14) 
case(Kansas-14, of-13)

问题 - 如何实现上述期望的输出？

代码

public void processTestCoreNLP() { 
    String text = "Bills on ports and immigration were submitted " + 
      "by Senator Brownback, Republican of Kansas"; 

    Annotation annotation = new Annotation(text); 
    Properties properties = PropertiesUtils.asProperties(
      "annotators", "tokenize,ssplit,pos,lemma,depparse" 
    ); 

    AnnotationPipeline pipeline = new StanfordCoreNLP(properties); 

    pipeline.annotate(annotation); 

    for (CoreMap sentence : annotation.get(SentencesAnnotation.class)) { 
     SemanticGraph sg = sentence.get(EnhancedPlusPlusDependenciesAnnotation.class); 
     Collection<TypedDependency> dependencies = sg.typedDependencies(); 
     for (TypedDependency td : dependencies) { 
      System.out.println(td); 
     } 
    } 
}

来源

2017-07-19 gimg1

是什么代码实际打印出来，然后呢？ – errantlinguist

不明确的道歉。代码输出第二个依赖关系块。我编辑得更清楚。 – gimg1

如果您想通过NN依赖关系解析器获取CC处理和折叠的Stanford依赖关系（SD），您必须设置一个属性来规避CoreNLP中的一个小错误。

然而，请注意，我们不再保持斯坦福依赖代码，除非你有很好的理由使用SD，我们建议你使用通用依赖任何新项目。请查看Universal Dependencies (UD) documentation和Schuster and Manning (2016)以获取有关UD表示的更多信息。

要获得CCprocessed和折叠SD表示，设置depparse.language属性如下：

public void processTestCoreNLP() { 
    String text = "Bills on ports and immigration were submitted " + 
     "by Senator Brownback, Republican of Kansas"; 

    Annotation annotation = new Annotation(text); 
    Properties properties = PropertiesUtils.asProperties(
     "annotators", "tokenize,ssplit,pos,lemma,depparse"); 

    properties.setProperty("depparse.language", "English") 

    AnnotationPipeline pipeline = new StanfordCoreNLP(properties); 

    pipeline.annotate(annotation); 

    for (CoreMap sentence : annotation.get(SentencesAnnotation.class)) { 
    SemanticGraph sg = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class); 
    Collection<TypedDependency> dependencies = sg.typedDependencies(); 
    for (TypedDependency td : dependencies) { 
     System.out.println(td); 
    } 
    } 
}

来源

2017-07-27 19:43:08

谢谢Sebastien。这就是我一直在寻找的东西。我甚至搜查了邮件，但没有遇到这个。 – gimg1

CoreNLP最近从旧Stanford dependencies格式（在顶部示例格式）切换到Universal Dependencies。我的第一个建议是尽可能使用新格式。对解析器的继续开发将使用通用依赖关系，并且格式在很多方面与旧格式类似，进行模数化修改（例如，prep - >nmod）。

但是，如果您希望获得旧的依赖格式，可以使用CollapsedCCProcessedDependenciesAnnotation批注执行此操作。

来源

2017-07-20 06:41:06

谢谢你的回答。通过我的调查，这是我认为是真实的，但是在使用'CollapsedCCProcessedDependenciesAnnotation'时，我仍然收到相同的通用样式依赖关系，即当应该是'prep'时仍然出现'nmod'。无论如何强迫退回到“prep”？ – gimg1

我进一步了，并设法输出'prep'而不是'nmod'。我现在注意到的是，他们还没有被缩减为“prep_on”。相反，我有两个独立的依赖关系'prep（Bills，on）'和'pobj（on，immigration）'。我应该如何减少这种情况？'prep_on（Bills，immigration）'。我必须自己做还是有方法？ – gimg1

我在这里深入了解自己，但是开始spelunking的地方应该是'Grammatical Structure'。然而，有一点可能是，斯坦福的依赖关系表示已经被抛弃了足够长的时间，因为它已经开始腐烂（例如'CollapsedCCProcessedDependenciesAnnotation'肯定意味着返回旧格式）。 –

CoreNLP斯坦福依赖格式

回答

相关问题