0
我发现在train.txt中训练情感模型的数据是PTB格式,看起来像这样。创建另一个train.txt来训练其他域的情感模型
(3 (2 Yet) (3 (2 (2 the) (2 act)) (3 (4 (3 (2 is) (3 (2 still) (4 charming))) (2 here)) (2 .))))
其真正的句子应该是
Yet the act is still charming here.
但是解析后,我得到了不同的结构
(ROOT (S (CC Yet) (NP (DT the) (NN act)) (VP (VBZ is) (ADJP (RB still) (JJ charming)) (ADVP (RB here))) (. .)))
跟随我的代码:
public static void main(String args[]){
// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit,parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
String text = "Yet the act is still charming here .";// Add your text here!
// create an empty Annotation just with the given text
Annotation annotation = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(annotation);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
// int sentiment = 0;
for(CoreMap sentence: sentences) {
// traversing the words in the current sentence
Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
System.out.println(tree);
// System.out.println(tree.yield());
tree.pennPrint(System.out);
// Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
// sentiment = RNNCoreAnnotations.getPredictedClass(tree);
}
// System.out.print(sentiment);
}
然后两个问题出现当我使用m y自己的句子来创建train.txt。
1.我的树不同于train.txt中的树,我知道后者中的数字是情感的极性。但似乎树结构不同,我想要得到一个二值化的分析树,它可能看起来像这样
((Yet) (((the) (act)) ((((is) ((still) (charming))) (here)) (.))))
一旦我得到的感悟号码,我可以填满它让我自己train.txt
2.How得到的二值化解析树的每个节点都短语,在这个例子中,我应该得到
Yet
the
act
the act
is
still
charming
still charming
is still charming
here
is still charming here
.
is still charming here .
the act is still charming here .
Yet the act is still charming here.
一旦我得到它们,我可以花钱注释他们的人类注解。
其实我google了他们很多,但不能解决它们,所以我张贴here.Any有用的答案将不胜感激!
太棒了!如果我想训练一个中国情感模型,那么train.txt中的语句仍然需要进行二进制解析? @StanfordNLPHelp – ryh