gpt4 book ai didi

java - 如何为 stanford corenlp 获取文本的 xml 输出

转载 作者:塔克拉玛干 更新时间:2023-11-02 07:49:04 26 4
gpt4 key购买 nike

我一直在阅读 API 和文档试图找到答案,但还没有接近解决问题。

我想获取一堆句子并将所有句子输出为 XML:

      <token id="1"> 
<word>That</word>
<lemma>that</lemma>
<CharacterOffsetBegin>0</CharacterOffsetBegin>
<CharacterOffsetEnd>4</CharacterOffsetEnd>
<POS>DT</POS>
<NER>O</NER>
</token>

我只是想出了如何解析树,但这对我想要构建的东西没有帮助。不管怎样,这是我现在使用的代码:

Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

// read some text in the text variable
String text = "We won the game."; // Add your text here!

// create an empty Annotation just with the given text
Annotation document = new Annotation(text);

// run all Annotators on this text
pipeline.annotate(document);

// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(SentencesAnnotation.class);

for(CoreMap sentence: sentences) {

// this is the parse tree of the current sentence
Tree tree = sentence.get(TreeAnnotation.class);

// this is the Stanford dependency graph of the current sentence
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
}

我正在使用文档中的代码。

最佳答案

使用内置的 xmlPrint 更容易一些:

    Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation document = new Annotation("Four score and seven years ago.");
pipeline.annotate(document);
FileOutputStream os = new FileOutputStream(new File("./target/", "nlp.xml"));
pipeline.xmlPrint(document, os);

关于java - 如何为 stanford corenlp 获取文本的 xml 输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18678595/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com