斯坦福数字名称实体识别

我有一个问题，我试图从使用斯坦福的文本中识别数字名称实体，万一我有例如2000万这是检索像这样的“数字”：[“20 -5“，”million-6“]，我怎样才能优化答案，所以两千万人聚在一起？以及如何在上例中忽略像（5,6）那样的索引号？我正在使用java语言。斯坦福数字名称实体识别

public void extractNumbers(String text) throws IOException { 
    number = new HashMap<String, ArrayList<String>>(); 
    n= new ArrayList<String>(); 
    edu.stanford.nlp.pipeline.Annotation document = new edu.stanford.nlp.pipeline.Annotation(text); 
    pipeline.annotate(document); 
    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class); 
    for (CoreMap sentence : sentences) { 
     for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) { 

      if (!token.get(CoreAnnotations.NamedEntityTagAnnotation.class).equals("O")) { 

       if (token.get(CoreAnnotations.NamedEntityTagAnnotation.class).equals("NUMBER")) { 
        n.add(token.toString()); 
     number.put("Number",n); 
       } 
      } 

     } 

    }

来源

2017-05-06 Rtrut Kuhfd

您可能想扩大一点。你使用了哪个模型？你在用什么语言？另外一个代码片段可以帮助我们确切知道你做了什么。 – entrophy

@entrophy我编辑了这个问题:) –

这里哪个类的对象是'pipeline'。正如你在使用斯坦福管道。 – entrophy

要想从CoreLabel类的任何对象的确切文字简单地使用token.originalText()代替token.toString()

如果你需要我做什么这些标记，看看CoreLabel的javadoc。

来源

2017-05-06 07:03:24 entrophy

这对我的第二个问题很有效，非常感谢 –

斯坦福数字名称实体识别

回答

相关问题