- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在尝试使用 Stanford CoreNLP 执行 Coref 解析。我使用的版本是stanford-corenlp-full-2015-12-09。基本上,我写了一些类:
import edu.stanford.nlp.dcoref.CorefChain;
import edu.stanford.nlp.dcoref.CorefCoreAnnotations;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Properties;
public class CorefResolution {
public static String corefResolute(String text, List<String> tokenToReplace) {
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation doc = new Annotation(text);
pipeline.annotate(doc);
Map<Integer, CorefChain> corefs = doc.get(CorefCoreAnnotations.CorefChainAnnotation.class);
System.out.println(corefs);
List<CoreMap> sentences = doc.get(CoreAnnotations.SentencesAnnotation.class);
List<String> resolved = new ArrayList<String>();
for (CoreMap sentence : sentences) {
List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
for (CoreLabel token : tokens) {
Integer corefClustId = token.get(CorefCoreAnnotations.CorefClusterIdAnnotation.class);
token.get(Coref)
if (corefClustId == null) {
System.out.println("NULL NULL NULL\n");
resolved.add(token.word());
continue;
}
else {
System.out.println("Exist Exist Exist\n");
}
System.out.println("coreClustId is "+corefClustId.toString()+"\n");
CorefChain chain = corefs.get(corefClustId);
if (chain == null || chain.getMentionsInTextualOrder().size() == 1) {
resolved.add(token.word());
} else {
int sentINdx = chain.getRepresentativeMention().sentNum - 1;
CoreMap corefSentence = sentences.get(sentINdx);
List<CoreLabel> corefSentenceTokens = corefSentence.get(CoreAnnotations.TokensAnnotation.class);
CorefChain.CorefMention reprMent = chain.getRepresentativeMention();
if (tokenToReplace.contains(token.word())) {
for (int i = reprMent.startIndex; i < reprMent.endIndex; i++) {
CoreLabel matchedLabel = corefSentenceTokens.get(i - 1);
resolved.add(matchedLabel.word());
}
} else {
resolved.add(token.word());
}
}
}
}
Detokenizer detokenizer = new Detokenizer();
String resolvedStr = detokenizer.detokenize(resolved);
return resolvedStr;
}
}
另一个类
import java.util.Arrays;
import java.util.List;
import java.util.LinkedList;
public class Detokenizer {
public String detokenize(List<String> tokens) {
//Define list of punctuation characters that should NOT have spaces before or after
List<String> noSpaceBefore = new LinkedList<String>(Arrays.asList(",", ".",";", ":", ")", "}", "]", "'", "'s", "n't"));
List<String> noSpaceAfter = new LinkedList<String>(Arrays.asList("(", "[","{", "\"",""));
StringBuilder sentence = new StringBuilder();
tokens.add(0, ""); //Add an empty token at the beginning because loop checks as position-1 and "" is in noSpaceAfter
for (int i = 1; i < tokens.size(); i++) {
if (noSpaceBefore.contains(tokens.get(i))
|| noSpaceAfter.contains(tokens.get(i - 1))) {
sentence.append(tokens.get(i));
} else {
sentence.append(" " + tokens.get(i));
}
// Assumption that opening double quotes are always followed by matching closing double quotes
// This block switches the " to the other set after each occurrence
// ie The first double quotes should have no space after, then the 2nd double quotes should have no space before
if ("\"".equals(tokens.get(i - 1))) {
if (noSpaceAfter.contains("\"")) {
noSpaceAfter.remove("\"");
noSpaceBefore.add("\"");
} else {
noSpaceAfter.add("\"");
noSpaceBefore.remove("\"");
}
}
}
return sentence.toString();
}
}
另一个类文件
import java.io.*;
import java.nio.charset.Charset;
import java.util.Arrays;
import java.util.List;
public class PlainTextCorefResolver {
public static void resolveFile(File inputFile, File outputFile) {
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(inputFile), Charset.forName("UTF-8")));
PrintWriter writer = new PrintWriter(outputFile, "UTF-8");
if (inputFile.exists()) System.out.println("input exist\n");
else System.out.println("input not exist\n");
if (outputFile.exists()) System.out.println("output exist\n");
else System.out.println("output not exist\n");
while(true){
String line = reader.readLine();
//EOF
if(line == null)
break;
//Resolve line
List<String> tokenToReplace = Arrays.asList("He", "he", "She", "she", "It", "it", "They", "they"); //!!!
String resolvedLine = CorefResolution.corefResolute(line, tokenToReplace);
writer.println(resolvedLine);
}
reader.close();
writer.close();
} catch (Exception e){
System.err.println("Failed to open/resolve input file [" +inputFile.getAbsoluteFile()+ "] in loader");
e.printStackTrace();
return;
}
}
public static void main(String[] args) {
String inputFileName = "path/file.txt";
String outputFileName = "path/file.resolved.txt";
File inputFile = new File(inputFileName);
File outputFile = new File(outputFileName);
resolveFile(inputFile, outputFile);
}
}
但是,它没有给出任何有用的结果。 corefClusterId 始终为空,因此我总是得到一堆“NULL NULL NULL”输出。
我怎样才能正确执行共指消解,用最典型的提及(人名或组织名称)替换“他/他/她/她/它/它/体育场/...”?
例如,给定:“Estadio El Madrigal 是西类牙的一座体育场,从 1923 年开始使用。目前主要用于足球比赛。”我想得到“Estadio El Madrigal 是西类牙的一座体育场,从 1923 年开始使用。Estadio El Madrigal 目前主要用于足球比赛。”
最佳答案
我不认为我们的 coref 系统在您的示例中将“Estadio El Madrigal”附加到“It”。
这里是一些用于访问 CorefChains 和一般提及的示例代码。
import edu.stanford.nlp.hcoref.CorefCoreAnnotations;
import edu.stanford.nlp.hcoref.data.CorefChain;
import edu.stanford.nlp.hcoref.data.Mention;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import java.util.*;
public class CorefExample {
public static void main(String[] args) throws Exception {
Annotation document = new Annotation("John Kerry is the secretary of state. He ran for president in 2004.");
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,mention,coref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
System.out.println("---");
System.out.println("coref chains");
for (CorefChain cc : document.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
System.out.println("\t"+cc);
System.out.println(cc.getMentionMap());
List<CorefChain.CorefMention> corefMentions = cc.getMentionsInTextualOrder();
for (CorefChain.CorefMention cm : corefMentions) {
System.out.println("---");
System.out.println("full text: "+cm.mentionSpan);
System.out.println("position: "+cm.position);
System.out.println("start index of first word: "+cm.startIndex);
}
}
for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
System.out.println("---");
System.out.println("mentions");
for (Mention m : sentence.get(CorefCoreAnnotations.CorefMentionsAnnotation.class)) {
System.out.println("\t"+m);
}
}
}
}
======================
更新
@StanfordNLPHelper,使用“coref”而不是“dcoref”时出现错误:
INFO: Read 25 rules
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...
done [0.3 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator mention
Using mention detector type: rule
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3079)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2874)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1639)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at java.util.HashMap.readObject(HashMap.java:1394)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at edu.stanford.nlp.io.IOUtils.readObjectFromURLOrClasspathOrFileSystem(IOUtils.java:324)
at edu.stanford.nlp.scoref.SimpleLinearClassifier.<init>(SimpleLinearClassifier.java:30)
at edu.stanford.nlp.scoref.PairwiseModel.<init>(PairwiseModel.java:75)
at edu.stanford.nlp.scoref.PairwiseModel$Builder.build(PairwiseModel.java:57)
at edu.stanford.nlp.scoref.ClusteringCorefSystem.<init>(ClusteringCorefSystem.java:31)
at edu.stanford.nlp.scoref.StatisticalCorefSystem.fromProps(StatisticalCorefSystem.java:48)
at edu.stanford.nlp.pipeline.CorefAnnotator.<init>(CorefAnnotator.java:66)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.coref(AnnotatorImplementations.java:220)
at edu.stanford.nlp.pipeline.AnnotatorFactories$13.create(AnnotatorFactories.java:515)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:85)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:375)
Process finished with exit code 1
关于java - 使用 Stanford CoreNLP 进行 CorefResolution,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36204856/
网站corenlp.run这应该是 CoreNLP 的演示站点,它显示了与我在本地机器上运行 CoreNLP 管道时完全不同的结果。 该网站实际上显示了正确的结果,而本地机器版本则没有。我想知道是否有
我不明白如何从我的 Java 应用程序加载 CoreNLP 的 Shift-Reduce Constituency Parser (SRCP)。 我正在使用 Apache Maven 来管理我的项目的
我正在尝试部署 stanford-corenlp-3.2.0-models.jar 但我的主机说 jar 太大? 如果我只是想使用 POS,我可以使用什么 jar 来代替。 或者我怎样才能分割 jar
当我启动CoreNLP Server时在 Linux 上: java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -po
从 https://github.com/stanfordnlp/CoreNLP 下载 CoreNLP 。当我运行时 $ java -mx4g src.edu.stanford.nlp.parser.
我正在尝试使用 Stanford CoreNLP。我使用了网络上的一些代码来了解 coreference 工具的运行情况。我尝试在 Eclipse 中运行该项目,但一直遇到内存不足异常。我尝试增加堆大
我正在使用斯坦福 coreNLP ( http://nlp.stanford.edu/software/corenlp.shtml ) 来解析句子并提取单词之间的依赖关系。 我已经设法像提供的链接中的
我正在尝试使用 Stanford coreNLP 将句子拆分为单词。我对包含撇号的单词有疑问。 例如,句子:我今年 24 岁。 像这样拆分:[我]['][24][岁][老] 是否可以使用 Stanfo
我想使用斯坦福 CoreNLP(或其他工具)提取两个实体之间的完整关系。 例如: Windows is more popular than Linux. This tool requires Java
stanford-corenlp 中的默认线程数是多少?具体来说,命名实体提取器,然后是信息提取器。另外,我希望两者都使用单个线程进行调试,我该如何设置? 谢谢! 最佳答案 默认为 1 个线程。 有两
我试图让斯坦福 CoreNLP 作为服务器正常运行(尽管问题可能会影响非服务器使用),但不断收到此错误: "ERROR CoreNLP - Failure to load language speci
在从斯坦福 CoreNLP 网站构建示例应用程序时,我遇到了一个奇怪的异常: Exception in thread "main" java.lang.RuntimeException: edu.st
我已经在 Eclipse 中设置了一个 Maven 项目。 他们只是一个类,src/main/java/App.java 包 com.nlptools.corenlp; import java.uti
我正在使用 CoreNlp 从大文本中提取信息。然而,它使用“三重”方法,其中单个句子产生许多输出,这很好,但有些句子没有意义。我试图通过运行另一个无监督 NLP 并尝试利用 CoreNlp 中的函数
我的代码开头有以下内容: import twitter4j.*; import java.util.List; import java.util.Properties; import ja
我正在尝试按照http://nlp.stanford.edu/downloads/corenlp.shtml中的说明在Stanford CoreNLP中添加一个新的注释器。 “添加新注释器Stanfo
我刚开始使用 Java 编写的程序,并且在让斯坦福CoreNLP 做它应该做的事情时遇到了很多麻烦。我将程序解压到它自己的目录中,并向其中添加了程序应该处理的 XML 文件。我用来在命令行中处理文件的
我正在尝试使用斯坦福 CoreNLP 关系提取器 ( http://nlp.stanford.edu/software/relationExtractor.shtml )。 我已经安装了 CoreNL
我在使用斯坦福大学的句子注释器时遇到了问题。作为输入,我得到了文本,其中包含句子,但其中某些部分的点后没有空格。像这样: Dog loves cat.Cat loves mouse. Mouse ha
我正在研究以下问题:我想使用 Stanford CoreNLP 将句子拆分为子句。例句可以是: "Richard is working with CoreNLP, but does not reall
我是一名优秀的程序员,十分优秀!