java - 使用斯坦福 Tregex 提取子树-6ren

java - 使用斯坦福 Tregex 提取子树

转载作者：行者123 更新时间：2023-12-01 15:48:29

25

4

我创建了一个使用 Tregex 提取子树的类。我使用了“TregexPattern.java”中的一些代码片段，因为我不想让程序使用控制台命令。

一般来说，有一个句子的树，我想提取某些子树(没有用户交互)。

到目前为止我所做的如下:

package edu.stanford.nlp.trees.tregex;
import edu.stanford.nlp.ling.StringLabelFactory;
import edu.stanford.nlp.trees.*;
import java.io.*;
import java.util.*;
public abstract class Test {
    abstract TregexMatcher matcher(Tree root, Tree tree, Map<String, Tree> namesToNodes, VariableStrings variableStrings);
    public TregexMatcher matcher(Tree t) {
        return matcher(t, t, new HashMap<String, Tree>(), new VariableStrings());
    }
    public static void main(String[] args) throws ParseException, IOException {
        String encoding = "UTF-8";
        TregexPattern p = TregexPattern.compile("NP < NN & <<DT"); //"/^MWV/" or "NP < (NP=np < NNS)"
        TreeReader r = new PennTreeReader(new StringReader("(VP (VP (VBZ Try) (NP (NP (DT this) (NN wine)) (CC and) (NP (DT these) (NNS snails)))) (PUNCT .))"), new LabeledScoredTreeFactory(new StringLabelFactory()));
        Tree t = r.readTree();
        treebank = new MemoryTreebank();
        treebank.add(t);
        TRegexTreeVisitor vis = new TRegexTreeVisitor(p, encoding);
        **treebank.apply(vis);  //line 26**
        if (TRegexTreeVisitor.printMatches) {
            System.out.println("There were " + vis.numMatches() + " matches in total.");
        }
    }
    private static Treebank treebank; // used by main method, must be accessible
    static class TRegexTreeVisitor implements TreeVisitor {
        private static boolean printNumMatchesToStdOut = false;
        static boolean printNonMatchingTrees = false;
        static boolean printSubtreeCode = false;
        static boolean printTree = false;
        static boolean printWholeTree = false;
        static boolean printMatches = true;
        static boolean printFilename = false;
        static boolean oneMatchPerRootNode = false;
        static boolean reportTreeNumbers = false;
        static TreePrint tp;
        PrintWriter pw;
        int treeNumber = 0;
        TregexPattern p;
        //String[] handles;
        int numMatches;
        TRegexTreeVisitor(TregexPattern p, String encoding) {
            this.p = p;
            //this.handles = handles;
            try {
                pw = new PrintWriter(new OutputStreamWriter(System.out, encoding), true);
            } catch (UnsupportedEncodingException e) {
                System.err.println("Error -- encoding " + encoding + " is unsupported.  Using ASCII print writer instead.");
                pw = new PrintWriter(System.out, true);
            }
            // tp.setPrintWriter(pw);
        }
        public void visitTree(Tree t) {
            treeNumber++;
            if (printTree) {
                pw.print(treeNumber + ":");
                pw.println("Next tree read:");
                tp.printTree(t, pw);
            }
            TregexMatcher match = p.matcher(t);
            if (printNonMatchingTrees) {
                if (match.find()) {
                    numMatches++;
                } else {
                    tp.printTree(t, pw);
                }
                return;
            }
            Tree lastMatchingRootNode = null;
            while (match.find()) {
                if (oneMatchPerRootNode) {
                    if (lastMatchingRootNode == match.getMatch()) {
                        continue;
                    } else {
                        lastMatchingRootNode = match.getMatch();
                    }
                }
                numMatches++;
                if (printFilename && treebank instanceof DiskTreebank) {
                    DiskTreebank dtb = (DiskTreebank) treebank;
                    pw.print("# ");
                    pw.println(dtb.getCurrentFile());
                }
                if (printSubtreeCode) {
                    pw.println(treeNumber + ":" + match.getMatch().nodeNumber(t));
                }
                if (printMatches) {
                    if (reportTreeNumbers) {
                        pw.print(treeNumber + ": ");
                    }
                    if (printTree) {
                        pw.println("Found a full match:");
                    }
                    if (printWholeTree) {
                        tp.printTree(t, pw);
                    } else {
                        **tp.printTree(match.getMatch(), pw);  //line 108**
                    }
                    // pw.println();  // TreePrint already puts a blank line in
                } // end if (printMatches)
            } // end while match.find()
        } // end visitTree
        public int numMatches() {
            return numMatches;
        }
    } // end class TRegexTreeVisitor
}

但它给出以下错误:

Exception in thread "main" java.lang.NullPointerException
        at edu.stanford.nlp.trees.tregex.Test$TRegexTreeVisitor.visitTree(Test.java:108)
        at edu.stanford.nlp.trees.MemoryTreebank.apply(MemoryTreebank.java:376)
        at edu.stanford.nlp.trees.tregex.Test.main(Test.java:26)
Java Result: 1

有什么修改或想法吗？

最佳答案

NullPointerException 通常是软件错误的指示器。

我过去也有同样的任务。使用依存解析器对句子进行解析。我决定将生成的解析树放入 XML(DOM) 中并对其执行 XPath 查询。

为了提高性能，您不需要将 xml 放入 String 中，只需将所有 XML 结构作为 DOM 保留在内存中(例如 http://www.ibm.com/developerworks/xml/library/x-domjava/ )。

使用 XPath 查询树状数据结构给我带来了以下好处:

轻松加载/保存/传输句子解析结果。
XPath 的强大语法/功能。
很多人都知道 XPath(每个人都可以自定义您的查询)。
XML 和 XPath 是跨平台的。
大量稳定的 XPath 和 XML/DOM 库实现。
能够使用 XSLT。
与现有的基于 XML 的管道集成XSLT+XPath -> XSD -> 执行操作 (例如，用户已在自由空间内的某个位置指定了他们的电子邮件地址和操作)文字投诉)。

关于java - 使用斯坦福 Tregex 提取子树，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6624479/

25

4

0

文章推荐： java - 我们可以在 SQLite 数据库中创建和访问多个表吗？

文章推荐： java - 转换 XML 时出错

文章推荐： Java Web应用程序，如何让客户端保存生成的.txt文件？

树结构之MongoDb 使用的到底是 B 树，还是 B+ 树？
关于 B 树与 B+ 树，网上有一个比较经典的问题：为什么 MongoDb 使用 B 树，而 MySQL 索引使用 B+ 树? 但实际上 MongoDb 真的用的是 B 树吗?
c# - 持久(基于磁盘)R 树(或 R* 树)
如何将 R* Tree 实现为持久(基于磁盘)树？保存 R* 树索引或保存叶值的文件的体系结构是什么？注意:此外，如何在这种持久性 R* 树中执行插入、更新和删除操作？注意事项二:我已经实现了一个
java - 给定另一个 AST 树，在 Java 中创建一个 AST 树
目前，我正在努力用 Java 表示我用 SML 编写的 AST 树，这样我就可以随时用 Java 遍历它。我想知道是否应该在 Java 中创建一个 Node 类，其中包含我想要表示的数据，以及一个数
c++ - C++ 中任何好的范围查询库(使用 K-D 树、四叉树或 R 树)
我之前用过这个库http://www.cs.umd.edu/~mount/ANN/ .但是，它们不提供范围查询实现。我猜是否有一个 C++ 范围查询实现(圆形或矩形)，用于查询二维数据。谢谢。最佳
为什么MySQL数据库索引选择使用B+树?
在进一步分析为什么MySQL数据库索引选择使用B+树之前，我相信很多小伙伴对数据结构中的树还是有些许模糊的，因此我们由浅入深一步步探讨树的演进过程，在一步步引出B树以及为什么MySQL数据库索引选择
数据结构-树，三探之代码实现
书接上回，今天和大家一起动手来自己实现树。相信通过前面的章节学习，大家已经明白树是什么了，今天我们主要针对二叉树，分别使用顺序存储和链式存储来实现树。 01、数组实现我们在上一节中说过，
数据结构-树，再探
书节上回，我们接着聊二叉树，N叉树，以及树的存储。 01、满二叉树如果一个二叉树，除最后一层节点外，每一层的节点数都达到最大值，即每个节点都有两个子节点，同时所有叶子节点都在最后一层，则这个
数据结构-树，初探
树是一种非线性数据结构，是以分支关系定义的层次结构，因此形态上和自然界中的倒挂的树很像，而数据结构中树根向上树叶向下。什么是树？ 01、定义树是由n（n>=0）个元素节点组成的
操作系统的那棵“树”---06
操作系统的那棵“树” 今天从一颗开始，我们看看如何从小树苗长成一颗苍天大树。运转CPU CPU运转起来很简单，就是不断的从内存取值执行。 CPU没有好好运转 IO是个耗费时间的活，如果CPU在取值
r - 从物种列表制作简单的系统发育树状图(树)
我想为海洋生物学类(class)制作一个简单的系统发育树作为教育示例。我有一个具有分类等级的物种列表: Group <- c("Benthos","Benthos","Benthos","Be
c++ - 树，无法正确删除节点
我从这段代码中删除节点时遇到问题，如果我插入数字 12 并尝试删除它，它不会删除它，我尝试调试，似乎当它尝试删除时，它出错了树的。但是，如果我尝试删除它已经插入主节点的节点，它将删除它，或者我插入数字
haskell - 如何在Haskell中实现B+树？
B+ 树的叶节点链接在一起。将 B+ 树的指针结构视为有向图，它不是循环的。但是忽略指针的方向并将其视为链接在一起的无向叶节点会在图中创建循环。在 Haskell 中，如何将叶子构造为父内部节点的子
GWT 树，开幕事件
我在 GWT 中使用树控件。我有一个自定义小部件，我将其添加为 TreeItem: Tree testTree = new Tree(); testTree.addItem(myWidget); 我想
c - 树/链表结构的遍历
它有点像混合树/链表结构。这是我定义结构的方式 struct node { nodeP sibling; nodeP child; nodeP parent; char
c - 树:使用队列进行层序遍历
我编写了使用队列遍历树的代码，但是下面的出队函数生成错误，head = p->next 是否有问题？我不明白为什么这部分是错误的。 void Levelorder(void) { node *tmp,
javascript - 将平面数组解析为嵌套结构(树)
例如，我想解析以下数组: var array1 = ["a.b.c.d", "a.e.f.g", "a.h", "a.i.j", "a.b.k"] 进入: var json1 = { "nod
java - 树-路径总和
问题 -> 给定一棵二叉树和一个和，确定该树是否具有从根到叶的路径，使得沿路径的所有值相加等于给定的和。我的解决方案 -> public class Solution { public bo
带有列的 Java 树
我有一个创建 java 树的任务，它包含三列:运动名称、运动类别中的运动计数和上次更新。类似的东西显示在下面的图像上: 如您所见，有 4 种运动:水上运动、球类运动、跳伞运动和舞蹈运动。当我展开 sk
mysql - H2数据库中的B+树
我想在 H2 数据库中实现 B+ Tree，但我想知道，B+ Tree 功能在 H2 数据库中可用吗？最佳答案 H2 已经使用了 B+ 树(PageBtree 类)。关于mysql - H2数据库
java - 字符串数组(树)
假设我们有 5 个字符串数组: String[] array1 = {"hello", "i", "cat"}; String[] array2 = {"hello", "i", "am"}; Str

首页

博学

6Ren·AI

商城

java - 使用斯坦福 Tregex 提取子树