gpt4 book ai didi

java - 创建另一个train.txt来训练其他领域的情感模型

转载 作者:行者123 更新时间:2023-12-02 03:13:58 25 4
gpt4 key购买 nike

我发现train.txt中训练情感模型的数据是PTB格式,如下所示。

(3 (2 Yet) (3 (2 (2 the) (2 act)) (3 (4 (3 (2 is) (3 (2 still) (4 charming))) (2 here)) (2 .))))

真正的句子应该是

Yet the act is still charming here.

但是解析后我得到了不同的结构

(ROOT (S (CC Yet) (NP (DT the) (NN act)) (VP (VBZ is) (ADJP (RB still) (JJ charming)) (ADVP (RB here))) (. .)))

遵循我的代码:

public static void main(String args[]){
// creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit,parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

// read some text in the text variable
String text = "Yet the act is still charming here .";// Add your text here!

// create an empty Annotation just with the given text
Annotation annotation = new Annotation(text);

// run all Annotators on this text

pipeline.annotate(annotation);

// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);

// int sentiment = 0;
for(CoreMap sentence: sentences) {
// traversing the words in the current sentence
Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
System.out.println(tree);
// System.out.println(tree.yield());
tree.pennPrint(System.out);
// Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
// sentiment = RNNCoreAnnotations.getPredictedClass(tree);
}

// System.out.print(sentiment);
}

然后当我用自己的句子创建train.txt时出现两个问题。

1.我的树与train.txt中的树不同,我知道后一个中的数字是情感极性。但似乎树结构不同,我想得到一个二值化的解析树,它可能看起来像这个

((Yet) (((the) (act)) ((((is) ((still) (charming))) (here)) (.))))

一旦我得到了情绪数字,我就可以填写它以获得我自己的train.txt

2.如何获取二值化解析树每个节点的所有短语,在这个例子中,我应该得到

Yet
the
act
the act
is
still
charming
still charming
is still charming
here
is still charming here
.
is still charming here .
the act is still charming here .
Yet the act is still charming here.

一旦我得到它们,我就可以花钱让人类注释者对它们进行注释。

实际上我用谷歌搜索了很多,但无法解决,所以我在这里发帖。任何有用的答案将不胜感激!

最佳答案

将其添加到属性中以获取二叉树:

props.setProperty("parse.binaryTrees", "true");

该句子的二叉树将通过以下方式访问:

Tree tree = sentence.set(TreeCoreAnnotations.BinarizedTreeAnnotation.class);

这是我编写的一些示例代码:

import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.Word;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.trees.*;

import java.util.ArrayList;
import java.util.Properties;

public class SubTreesExample {

public static void printSubTrees(Tree inputTree, String spacing) {
if (inputTree.isLeaf()) {
return;
}
ArrayList<Word> words = new ArrayList<Word>();
for (Tree leaf : inputTree.getLeaves()) {
words.addAll(leaf.yieldWords());
}
System.out.print(spacing+inputTree.label()+"\t");
for (Word w : words) {
System.out.print(w.word()+ " ");
}
System.out.println();
for (Tree subTree : inputTree.children()) {
printSubTrees(subTree, spacing + " ");
}
}

public static void main(String[] args) {
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse");
props.setProperty("parse.binaryTrees", "true");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "Yet the act is still charming here.";
Annotation annotation = new Annotation(text);
pipeline.annotate(annotation);
Tree sentenceTree = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0).get(
TreeCoreAnnotations.BinarizedTreeAnnotation.class);
System.out.println("Penn tree:");
sentenceTree.pennPrint(System.out);
System.out.println();
System.out.println("Phrases:");
printSubTrees(sentenceTree, "");

}
}

关于java - 创建另一个train.txt来训练其他领域的情感模型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40604719/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com