java - 如何使用 Stanford CoreNLP Coreferences 模块通过最具代表性的提及来替换单词-6ren

java - 如何使用 Stanford CoreNLP Coreferences 模块通过最具代表性的提及来替换单词

转载作者：搜寻专家更新时间：2023-10-30 21:28:14

26

4

想法是重写如下句子:

John drove to Judy’s house. He made her dinner.

进入

John drove to Judy’s house. John made Judy dinner.

这是我一直在胡闹的代码:

    private void doTest(String text){
    Annotation doc = new Annotation(text);
    pipeline.annotate(doc);


    Map<Integer, CorefChain> corefs = doc.get(CorefChainAnnotation.class);
    List<CoreMap> sentences = doc.get(CoreAnnotations.SentencesAnnotation.class);


    List<String> resolved = new ArrayList<String>();

    for (CoreMap sentence : sentences) {

        List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);

        for (CoreLabel token : tokens) {

            Integer corefClustId= token.get(CorefCoreAnnotations.CorefClusterIdAnnotation.class);
            System.out.println(token.word() +  " --> corefClusterID = " + corefClustId);


            CorefChain chain = corefs.get(corefClustId);
            System.out.println("matched chain = " + chain);


            if(chain==null){
                resolved.add(token.word());
            }else{

                int sentINdx = chain.getRepresentativeMention().sentNum -1;
                CoreMap corefSentence = sentences.get(sentINdx);
                List<CoreLabel> corefSentenceTokens = corefSentence.get(TokensAnnotation.class);

                String newwords = "";
                CorefMention reprMent = chain.getRepresentativeMention();
                System.out.println(reprMent);
                for(int i = reprMent.startIndex; i<reprMent.endIndex; i++){
                    CoreLabel matchedLabel = corefSentenceTokens.get(i-1); //resolved.add(tokens.get(i).word());
                    resolved.add(matchedLabel.word());

                    newwords+=matchedLabel.word()+" ";

                }




                System.out.println("converting " + token.word() + " to " + newwords);
            }


            System.out.println();
            System.out.println();
            System.out.println("-----------------------------------------------------------------");

        }

    }


    String resolvedStr ="";
    System.out.println();
    for (String str : resolved) {
        resolvedStr+=str+" ";
    }
    System.out.println(resolvedStr);


}

我现在能达到的最佳输出是

John drove to Judy 's 's Judy 's house . John made Judy 's her dinner .

这不是很聪明......

我很确定有一种更简单的方法可以实现我想要实现的目标。

理想情况下，我想将句子重新组织为 CoreLabel 列表，这样我就可以保留它们附加的其他数据。

感谢任何帮助。

最佳答案

挑战在于您需要确保 token 不是其代表性提及的一部分。例如，标记“Judy”具有“Judy's”作为其代表提及项，因此如果您将其替换为短语“Judy's”，您将得到双“'s”。

您可以通过比较它们的索引来检查 token 是否是其代表性提及的一部分。如果 token 的索引小于代表性提及的 startIndex，或大于代表性提及的 endIndex，您应该只替换 token 。否则你只保留 token 。

您的代码的相关部分现在将如下所示:

            if (token.index() < reprMent.startIndex || token.index() > reprMent.endIndex) {

                for (int i = reprMent.startIndex; i < reprMent.endIndex; i++) {
                    CoreLabel matchedLabel = corefSentenceTokens.get(i - 1); 
                    resolved.add(matchedLabel.word());

                    newwords += matchedLabel.word() + " ";

                }
            }

            else {
                resolved.add(token.word());

            }

此外，为了加快进程，您还可以将第一个 if 条件替换为:

if (chain==null || chain.getMentionsInTextualOrder().size() == 1)

毕竟，如果共指链的长度只有 1，那么寻找具有代表性的提及是没有用的。

关于java - 如何使用 Stanford CoreNLP Coreferences 模块通过最具代表性的提及来替换单词，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30182138/

26

4

0

文章推荐： java - 类型安全配置 : Load configuration from src/test/resources

文章推荐： java - TabHost 显示一次内容 (onCreate)

文章推荐： angular - 带有 angular2 4 选项卡的动态路由

stanford-nlp - Stanford Parser的标签
我刚开始使用Stanford Parser，但我不太了解这些标签。这可能是一个愚蠢的问题，但是谁能告诉我SBARQ和SQ标签代表什么，在哪里可以找到它们的完整列表？我知道Penn Treebank的样
stanford-nlp - nltk stanford ner tagger 和 stanford ner tagger 在线演示之间的不一致
我正在使用 python 的内置库 nltk 来获取 stanford ner tagger api 设置，但我发现此 api 的单词标记与 stanford 的 ner tagger 网站上的在线演
stanford-nlp - 初始堆错误太小 - stanford parser
我正在尝试使用斯坦福依赖解析器。我尝试从 Windows 上的命令行运行解析器以使用以下命令提取依赖项: java -mx100m -cp "stanford-parser.jar" edu.stan
stanford-nlp - Stanford CoreNLP BasicPipelineExample 不起作用
我正在尝试开始使用 Stanford CoreNLP，甚至无法通过这里的第一个简单示例。 https://stanfordnlp.github.io/CoreNLP/api.html 这是我的代码:
stanford-nlp - 用 stanford-nlp 分块一些文本
我正在使用 stanford 核心 NLP，并使用这一行来加载一些模块来处理我的文本: props.put("annotators", "tokenize, ssplit, pos, lemma, n
stanford-nlp - Stanford Core NLP 是否支持德语词形还原？
我找到了与 Stanford Core NLP 兼容的德语解析和 pos-tag 模型。但是我无法使德语词形还原工作。有办法吗？最佳答案抱歉，据我所知，Stanford CoreNLP 不存在德语
stanford-nlp - 是否可以选择从 Stanford Parser 获取每个句子的处理时间？
我目前正在使用以下命令解析阿拉伯文本: java -mx1500m edu.stanford.nlp.parser.lexparser.LexicalizedParser \ -cp "$scri
stanford-nlp - 为什么 stanford corenlp 性别识别是不确定的？
我有以下结果，如您所见，爱德华这个名字有不同的结果(null 和 male)。这发生在几个名字上。 edward, Gender: null james, Gender: MALE karla, Ge
stanford-nlp - 如何基于 stanford-nlp 条件随机场模型训练法国 NER？
我发现了 stanford-NLP 的工具，发现它真的很有趣。我是一名法国数据挖掘者/数据科学家，喜欢文本分析，并且很想使用您的工具，但是 NER 在法语中不可用，这让我感到非常困惑。我很想制作我
c++ - linux 上的 Stanford Stanford C++ 库
我正在使用 Suse Linux 13.1 并自学斯坦福大学的 CS 106b 类(class)。我在这里找到了压缩库 http://www.stanford.edu/class/cs106b/hom
nlp - 如何使用 Stanford Parser 或 Stanford CoreNLP 找到名词短语的语法关系
我正在使用 stanford CoreNLP 来尝试查找名词短语的语法关系。这是一个例子: 给定“The fitness room was dirty”这句话。我成功地将“The fitness
stanford-nlp - 格式化 Stanford Corenlp 的 NER 输出
我正在使用 Stanford CoreNLP 并将其用于 NER。但是当我提取组织名称时，我看到每个词都标有注释。因此，如果实体是“纽约时报”，那么它将被记录为三个不同的实体:“NEW”、“YORK”
stanford-nlp - stanford corenlp 3.3.1 语言支持
我开始使用 coreNLP 库 3.3.1 来分析意大利文本文档。有没有人尝试过使用英语以外的语言？您是否找到了训练算法所需的模型？谢谢卡罗最佳答案目前，除了英语，我们只为中文打包模型(见 h
stanford-nlp - 使用 Core NLP 和 Stanford Parser 执行词性标注的结果不同？
斯坦福解析器和斯坦福 CoreNlp 的词性 (POS) 模型用途不同，这就是为什么通过 Stanford Parser 和 CoreNlp 执行的 POS 标记的输出存在差异。在线核心 NLP 输
java - Stanford-CoreNLP 和 Stanford-Parser 中的 Maven 类名冲突
我的 (maven) 项目依赖于 stanford-CoreNLP 和 stanford-Parser，显然每个依赖项的(词汇化)解析器产生不同的输出，它们并不相同。我的问题是如何确定应该从哪个包加
c# - Stanford CoreNLP 创建 edu.stanford.nlp.time.TimeExpressionExtractorImpl 时出错
我正在尝试学习 Stanford CoreNLP 库。我在发布的示例 ( https://sergeytihon.wordpress.com/2013/10/26/stanford-corenlp-i
java - 无法在 .\stanford-corenlp-4.0.0 找到 stanford-parser\.jar jar 文件
我是 nltk 的新手，似乎正在遵循过时的教程来开始使用 nltk 中的 StanleyDependencyParser。我已经从https://stanfordnlp.github.io/安装了S
java - Stanford Core NLP ner 4.0.0错误: Could not find or load main class stanford-ner.jar;lib.*
我正在尝试使用Stanford CoreNLP训练NER模型，但是找不到主类。我已经在我的CLASSPATH中包含了jar文件的路径，但仍然找不到它们。有什么办法解决这个问题吗？ C:\ Users
scala - 类型不匹配;找到 : edu. stanford.nlp.util.CoreMap => 需要单位 : java. util.function.Consumer[_> : edu. stanford.nlp.util.CoreMap]
我不明白它要我做什么。分配给 sentence正在工作: val sentences : java.util.List[CoreMap] = document.get(classOf[Sentence
stanford-nlp - 斯坦福NLP训练情感模型
我正在参加 Rotten Tomatoes NLP 预测的 kaggle 竞赛。训练集格式解析如下: PhraseId SentenceId Phrase Sentiment 1 1 A serie

首页

博学

6Ren·AI

商城

java - 如何使用 Stanford CoreNLP Coreferences 模块通过最具代表性的提及来替换单词