gpt4 book ai didi

java - 斯坦福 NLP 注释文本非常慢

转载 作者:行者123 更新时间:2023-11-30 08:54:05 26 4
gpt4 key购买 nike

我正在使用 Stanford CoreNLP 在 Windows 机器上运行 Java 的 NLP 项目。我想从这个注释一篇大文本文章。我写的代码如下;

Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref, regexner");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation document = new Annotation("Text to be annotated. This text is very long!");
pipeline.annotate(document); // this line takes a long time

文本的注释占用了相当长的时间。对于大约 60 个单词,这一特定行大约需要 16 秒,这太长了。

有没有办法加快这个处理过程,或者有什么我遗漏的。请告诉我我能做什么。提前致谢:-)

编辑

代码示例

    public TextReader() {
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, regexner");
pipeline = new StanfordCoreNLP(props);
extractor = CoreMapExpressionExtractor.
createExtractorFromFiles(TokenSequencePattern.getNewEnv(), "Stanford NLP\\stanford-corenlp-full-2015-01-29\\stanford-corenlp-full-2015-01-30\\tokensregex\\color.rules.txt");
text = "Barak Obama was born on August 4, 1961,at Kapiolani Maternity & Gynecological Hospital "
+ " in Honolulu, Hawaii, and would become the first President to have been born in Hawaii. His mother, Stanley Ann Dunham,"
+ " was born in Wichita, Kansas, and was of mostly English ancestry. His father, Barack Obama, Sr., was a Luo from Nyang’oma"
+ " Kogelo, Kenya. He studied at the University of Westminster. His favourite colour is red.";
Logger.getLogger(TextReader.class.getName()).log(Level.INFO, "Annotator starting...", text); // LOG 1
Annotation document = new Annotation(text);
pipeline.annotate(document);
Logger.getLogger(TextReader.class.getName()).log(Level.INFO, "Annotator finished...", props); // LOG 2
sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
//the tokens of the sentence are taken and iterated over
// the NER, POS and lemma of the tokens are stores iteratively
}
}

我意识到 LOG 1 和 LOG 2 之间的时间大约为 16 秒。我需要的是处理更长的文本,这需要很长时间。请告诉我我做错了什么?

谢谢=D

最佳答案

文本是一个长句子吗?相对于句子的长度,解析器的运行时间为 O(n^3),对于超过 40 个单词的句子来说会变得相当慢。如果删除“parse、dcoref、regexner”注释器,它会加速吗?而且,如果您重新添加“解析”,它会再次变慢吗?

如果您关心的是依赖性解析而不是选区解析,新的“depparse”注释器将更快地生成这些;不过,我们的 coref 还不能用于依赖解析(即将推出!)。

关于java - 斯坦福 NLP 注释文本非常慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29543274/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com