如何阻止斯坦福CoreNLP分割我的句子

我已经分割了与我的分段句子匹配的资源和资源。如何阻止斯坦福CoreNLP分割我的句子

如何在生成解析树之前停止Stanford CoreNLP分割我的语句？

我正在做中文。

2016-12-03 Nick Qian

你的描述不是很精确，所以我不知道我是否正确解释你的问题。这听起来像你想给解析器提供一个令牌列表，而不需要corenlp做任何标记，对吧？如果是这样，那么知道您正在使用哪个解析器会很有用。但是对于这两种情况，你只需要给它一个令牌列表，而corenlp不会跳进去，搞乱你的令牌化。我还没有与中国资源工作，但下面可以帮助你（如果你做了断词之前已经和拆分在适当的断词空白结果）：

String sentence = "I can't do that ."; 
    ArrayList<HasWord> hwl = new ArrayList<HasWord>(); 
    String[] tokens = sentence.split(" "); 
    for (String t : tokens){ 
    HasWord hw = new Word(); 
    hw.setWord(t); 
    hwl.add(hw); 
    } 
    LexicalizedParser lexParser = LexicalizedParser.loadModel("<path to chinese lex parsing here>","-maxLength", "70"); 
    Tree cTree = lexParser.parse(hwl); 
    System.out.println("c tree:" + cTree); 


    DependencyParser parser = DependencyParser.loadFromModelFile("<chinese model for dep parsing here>"); 
    MaxentTagger tagger = new MaxentTagger("<path to your tagger file goes here"); 
    List<TaggedWord> tagged = tagger.tagSentence(hwl); 
    GrammaticalStructure gs = parser.predict(tagged); 
    System.out.println("dep tree:" + gs.typedDependencies());

删除已写入标准错误线，结果如下：

c tree:(ROOT (S (MPN (FM I) (FM can't)) (VVFIN do) (ADJD that) ($. .))) 
dep tree:[nsubj(can't-2, I-1), root(ROOT-0, can't-2), xcomp(can't-2, do-3), dobj(do-3, that-4), punct(can't-2, .-5)]

希望这有助于。

来源

2016-12-05 16:13:37 Igor

这是一个很好的答案，但我只是通过使用命令行命令解决了这个问题：'java -Xmx2g -cp“../parsing/*”edu.stanford.nlp.parser.lexparser.LexicalizedParser -sentences newline -outputFormat typedDependencies ../parsing/xinhuaFactored.ser.gz ./cpbdev.out> cpbdev.out.parsing' –

我发现这些API真的很难，你是如何学习的？通过阅读javadoc？ –

啊，是的，它是从命令行使用它，这种方式更有意义。一般来说，我认为来自corenlp的API是非常好的文档，但是有一些隐藏的东西需要你自己去找。只是玩它总是有助于理解:) – Igor

如何阻止斯坦福CoreNLP分割我的句子

回答

相关问题