java - 创建另一个train.txt来训练其他领域的情感模型-6ren

java - 创建另一个train.txt来训练其他领域的情感模型

转载作者：行者123 更新时间：2023-12-02 03:13:58

25

4

我发现train.txt中训练情感模型的数据是PTB格式，如下所示。

(3 (2 Yet) (3 (2 (2 the) (2 act)) (3 (4 (3 (2 is) (3 (2 still) (4 charming))) (2 here)) (2 .))))

真正的句子应该是

Yet the act is still charming here.

但是解析后我得到了不同的结构

(ROOT (S (CC Yet) (NP (DT the) (NN act)) (VP (VBZ is) (ADJP (RB still) (JJ charming)) (ADVP (RB here))) (. .)))

遵循我的代码:

public static void main(String args[]){
    // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit,parse");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // read some text in the text variable
    String text = "Yet the act is still charming here .";// Add your text here!

    // create an empty Annotation just with the given text
    Annotation annotation = new Annotation(text);

    // run all Annotators on this text

    pipeline.annotate(annotation);

    // these are all the sentences in this document
    // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
    List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);

    // int sentiment = 0;
    for(CoreMap sentence: sentences) {
        // traversing the words in the current sentence
        Tree tree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
        System.out.println(tree);
        // System.out.println(tree.yield());
        tree.pennPrint(System.out);
        // Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
        // sentiment = RNNCoreAnnotations.getPredictedClass(tree);
    }

    // System.out.print(sentiment);
}

然后当我用自己的句子创建train.txt时出现两个问题。

1.我的树与train.txt中的树不同，我知道后一个中的数字是情感极性。但似乎树结构不同，我想得到一个二值化的解析树，它可能看起来像这个

((Yet) (((the) (act)) ((((is) ((still) (charming))) (here)) (.))))

一旦我得到了情绪数字，我就可以填写它以获得我自己的train.txt

2.如何获取二值化解析树每个节点的所有短语，在这个例子中，我应该得到

Yet
the 
act
the act
is
still 
charming 
still charming 
is still charming
here
is still charming here
.
is still charming here .
the act is still charming here .
Yet the act is still charming here.

一旦我得到它们，我就可以花钱让人类注释者对它们进行注释。

实际上我用谷歌搜索了很多，但无法解决，所以我在这里发帖。任何有用的答案将不胜感激!

最佳答案

将其添加到属性中以获取二叉树:

props.setProperty("parse.binaryTrees", "true");

该句子的二叉树将通过以下方式访问:

Tree tree = sentence.set(TreeCoreAnnotations.BinarizedTreeAnnotation.class);

这是我编写的一些示例代码:

import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.Word;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.trees.*;

import java.util.ArrayList;
import java.util.Properties;

public class SubTreesExample {

    public static void printSubTrees(Tree inputTree, String spacing) {
        if (inputTree.isLeaf()) {
            return;
        }
        ArrayList<Word> words = new ArrayList<Word>();
        for (Tree leaf : inputTree.getLeaves()) {
            words.addAll(leaf.yieldWords());
        }
        System.out.print(spacing+inputTree.label()+"\t");
        for (Word w : words) {
            System.out.print(w.word()+ " ");
        }
        System.out.println();
        for (Tree subTree : inputTree.children()) {
            printSubTrees(subTree, spacing + " ");
        }
    }

    public static void main(String[] args) {
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse");
        props.setProperty("parse.binaryTrees", "true");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        String text = "Yet the act is still charming here.";
        Annotation annotation = new Annotation(text);
        pipeline.annotate(annotation);
        Tree sentenceTree = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0).get(
                TreeCoreAnnotations.BinarizedTreeAnnotation.class);
        System.out.println("Penn tree:");
        sentenceTree.pennPrint(System.out);
        System.out.println();
        System.out.println("Phrases:");
        printSubTrees(sentenceTree, "");

    }
}

关于java - 创建另一个train.txt来训练其他领域的情感模型，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40604719/

25

4

0

文章推荐： ruby-on-rails - STI 多态 has_many 使用错误的类型值

文章推荐： java - 为什么我的 if else 语句不执行？

文章推荐： java - JPA @GenerateValue 和 @Id

文章推荐： java - 尝试从字符串中删除字符

string - grep 两个文件 (a.txt, b.txt) - b.txt 中有多少行以 a.txt 中的单词开始(或结束) - 输出 : 2 files with the results
我知道我要求太多，但也许你也可以帮助解决这个问题。 a.txt 包含单词，b.txt 包含字符串。我想知道 b.txt 中有多少个字符串以 a.txt 中的单词结尾例子:一个.txt apple
linux - 将 1.txt、2.txt ... 10.txt 连接成一个文件
这个问题在这里已经有了答案: erge text files ordered by numerical filenames in Bash (3 个答案) 关闭 4 年前。我有一个文件夹，其中包含
windows - 如何批量替换目录中的文件 windows vista(从 .txt.txt 到 .txt)
我在一个目录中有几个平面文件 (.txt)。所有这些文件的格式都是 *.txt.txt，所以我想将其重命名为 *.txt？有什么简单的方法可以一起重命名？当我尝试 ren *.txt.txt *.t
linux - Ubuntu 上的基本 bash 命令 : wc -l < file1. txt > file2.txt vs wc -l < file1.txt > file1.txt
这个问题在这里已经有了答案: How can I use a file in a command and redirect output to the same file without trunc
robots.txt - 为什么在javascript文件上使用robot.txt？
您是否有任何理由应该或不应该允许访问 javascript 或 css 文件？特别是常见的文件，如 jquery。最佳答案人们普遍认为，搜索引擎每天为给定站点分配一定数量的带宽或 URL。因此，一
robots.txt - Googlebots忽略了robots.txt？
Closed. This question is off-topic。它当前不接受答案。想要改善这个问题吗？ Update the question，所以它是用于堆栈溢出的on-topic。已关闭
c - 我想读取一个名为 input.txt(某个名称)的文本文件，并将偶数和奇数单词分成两个不同的文件名 Even.txt 和 odd.txt
这是相同的代码。我面临的问题是，我无法在任何文件上写入任何内容。请帮忙解决这个问题 #include #include int main() { FILE *fe; FILE *fo;
apache - 使用 htaccess 的域特定机器人文件将 robots.txt 重写为 example.com.txt 或回退到 default.txt
我想要特定于域的 robots.txt，到目前为止这有效: RewriteRule ^robots\.txt$ robots/%{HTTP_HOST}.txt [L] 但我希望有一个后备方案，因此如果
sql-server - "> sql.txt && sql -h-1 -i sql.txt && del sql.txt"命令是什么意思？
我正在调试一些构建成功运行的 SQL 命令的代码。然而，在查询结束时，查询结果似乎被写入了一个文本文件。完整的查询如下 echo SELECT DATE,DATETABLE,DATE,APPDAT
linux - cat source.txt | cat source.txt 和有什么区别grep x 和 grep x source.txt？
这个问题已经有答案了: difference between grep Vs cat and grep (5 个回答) 已关闭 8 年前。我看到一个例子，其中有人这样做: cat source.tx
sql - 在Sql Bulk Insert语句中，我们可以使用相对路径(files\a.txt)而不是绝对路径(c :\abc\a. txt)或网络通用路径(\\abc\a.txt)吗？
我想将表中的数据从以 csv 格式存储的文本文件插入到 sql server 表中。为此，我正在使用批量插入语句。现在我需要在“From”子句中指定文件名。我不想在那里使用网络位置或本地位置。我想将我
robots.txt - robots.txt 是否适用于子域？
假设我有一个测试文件夹 (test.domain.com) 并且我不希望搜索引擎在其中抓取，我是否需要在测试文件夹中有一个 robots.txt 或者我可以只放置一个 robots.txt在根目录中，
robots.txt - 如何禁止所有动态网址 robots.txt
关闭。这个问题是off-topic .它目前不接受答案。想改善这个问题吗？ Update the question所以它是 on-topic对于堆栈溢出。 9年前关闭。 Improve this q
robots.txt - robots.txt 中的顺序重要吗？
这个问题在这里已经有了答案: order of directives in robots.txt, do they overwrite each other or complement each ot
robots.txt - robots.txt 的伦理
关闭。这个问题是opinion-based .它目前不接受答案。想改进这个问题？更新问题，以便 editing this post 可以用事实和引用来回答它. 8年前关闭。 Improve this
robots.txt - hackers.txt 文件有什么用？
已关闭。这个问题是 not about programming or software development 。目前不接受答案。这个问题似乎不是关于 a specific programming
asp.net - txt 名字和名字 Txt
在过去的几年中，当我引用“名字”字段的文本框控件时，我一直使用 FirstNameTxt 命名约定。但是，我注意到大多数其他开发人员倾向于使用命名约定 txtFirstName 哪个是最好的约定？为什
robots.txt - robots.txt 中只允许目录中的一个文件吗？
我只想允许目录 /minsc 中的一个文件，但我想禁止该目录的其余部分。现在 robots.txt 中是这样的: User-agent: * Crawl-delay: 10 # Directorie
robots.txt - 请求机器人重新解析 robots.txt
我正在编写一个将 youtube.com 映射到另一个域的代理服务器(因此用户可以轻松地从德国等国家/地区访问 youtube，而无需审查搜索结果和视频)。不幸的是，我的 robots.txt 中存
Powershell:使用字符串中的文件名从一个 .txt 创建多个 .txt
我没有编程技能，但有一项非常具体的任务:我必须将一个庞大的文本文件拆分成多个，并在特定的文本标记 (@) 处拆分它们。我决定尝试使用 Powershell 脚本来完成此任务。到目前为止，这就是我想出

首页

博学

6Ren·AI

商城

java - 创建另一个train.txt来训练其他领域的情感模型