gpt4 book ai didi

java - 使用斯坦福 Tregex 提取子树

转载 作者:行者123 更新时间:2023-12-01 15:48:29 25 4
gpt4 key购买 nike

我创建了一个使用 Tregex 提取子树的类。我使用了“TregexPattern.java”中的一些代码片段,因为我不想让程序使用控制台命令。

一般来说,有一个句子的树,我想提取某些子树(没有用户交互)。

到目前为止我所做的如下:

package edu.stanford.nlp.trees.tregex;
import edu.stanford.nlp.ling.StringLabelFactory;
import edu.stanford.nlp.trees.*;
import java.io.*;
import java.util.*;
public abstract class Test {
abstract TregexMatcher matcher(Tree root, Tree tree, Map<String, Tree> namesToNodes, VariableStrings variableStrings);
public TregexMatcher matcher(Tree t) {
return matcher(t, t, new HashMap<String, Tree>(), new VariableStrings());
}
public static void main(String[] args) throws ParseException, IOException {
String encoding = "UTF-8";
TregexPattern p = TregexPattern.compile("NP < NN & <<DT"); //"/^MWV/" or "NP < (NP=np < NNS)"
TreeReader r = new PennTreeReader(new StringReader("(VP (VP (VBZ Try) (NP (NP (DT this) (NN wine)) (CC and) (NP (DT these) (NNS snails)))) (PUNCT .))"), new LabeledScoredTreeFactory(new StringLabelFactory()));
Tree t = r.readTree();
treebank = new MemoryTreebank();
treebank.add(t);
TRegexTreeVisitor vis = new TRegexTreeVisitor(p, encoding);
**treebank.apply(vis); //line 26**
if (TRegexTreeVisitor.printMatches) {
System.out.println("There were " + vis.numMatches() + " matches in total.");
}
}
private static Treebank treebank; // used by main method, must be accessible
static class TRegexTreeVisitor implements TreeVisitor {
private static boolean printNumMatchesToStdOut = false;
static boolean printNonMatchingTrees = false;
static boolean printSubtreeCode = false;
static boolean printTree = false;
static boolean printWholeTree = false;
static boolean printMatches = true;
static boolean printFilename = false;
static boolean oneMatchPerRootNode = false;
static boolean reportTreeNumbers = false;
static TreePrint tp;
PrintWriter pw;
int treeNumber = 0;
TregexPattern p;
//String[] handles;
int numMatches;
TRegexTreeVisitor(TregexPattern p, String encoding) {
this.p = p;
//this.handles = handles;
try {
pw = new PrintWriter(new OutputStreamWriter(System.out, encoding), true);
} catch (UnsupportedEncodingException e) {
System.err.println("Error -- encoding " + encoding + " is unsupported. Using ASCII print writer instead.");
pw = new PrintWriter(System.out, true);
}
// tp.setPrintWriter(pw);
}
public void visitTree(Tree t) {
treeNumber++;
if (printTree) {
pw.print(treeNumber + ":");
pw.println("Next tree read:");
tp.printTree(t, pw);
}
TregexMatcher match = p.matcher(t);
if (printNonMatchingTrees) {
if (match.find()) {
numMatches++;
} else {
tp.printTree(t, pw);
}
return;
}
Tree lastMatchingRootNode = null;
while (match.find()) {
if (oneMatchPerRootNode) {
if (lastMatchingRootNode == match.getMatch()) {
continue;
} else {
lastMatchingRootNode = match.getMatch();
}
}
numMatches++;
if (printFilename && treebank instanceof DiskTreebank) {
DiskTreebank dtb = (DiskTreebank) treebank;
pw.print("# ");
pw.println(dtb.getCurrentFile());
}
if (printSubtreeCode) {
pw.println(treeNumber + ":" + match.getMatch().nodeNumber(t));
}
if (printMatches) {
if (reportTreeNumbers) {
pw.print(treeNumber + ": ");
}
if (printTree) {
pw.println("Found a full match:");
}
if (printWholeTree) {
tp.printTree(t, pw);
} else {
**tp.printTree(match.getMatch(), pw); //line 108**
}
// pw.println(); // TreePrint already puts a blank line in
} // end if (printMatches)
} // end while match.find()
} // end visitTree
public int numMatches() {
return numMatches;
}
} // end class TRegexTreeVisitor
}

但它给出以下错误:

Exception in thread "main" java.lang.NullPointerException
at edu.stanford.nlp.trees.tregex.Test$TRegexTreeVisitor.visitTree(Test.java:108)
at edu.stanford.nlp.trees.MemoryTreebank.apply(MemoryTreebank.java:376)
at edu.stanford.nlp.trees.tregex.Test.main(Test.java:26)
Java Result: 1

有什么修改或想法吗?

最佳答案

NullPointerException 通常是软件错误的指示器。

我过去也有同样的任务。使用依存解析器对句子进行解析。我决定将生成的解析树放入 XML(DOM) 中并对其执行 XPath 查询。

为了提高性能,您不需要将 xml 放入 String 中,只需将所有 XML 结构作为 DOM 保留在内存中(例如 http://www.ibm.com/developerworks/xml/library/x-domjava/ )。

使用 XPath 查询树状数据结构给我带来了以下好处:

  1. 轻松加载/保存/传输句子解析结果。
  2. XPath 的强大语法/功能。
  3. 很多人都知道 XPath(每个人都可以自定义您的查询)。
  4. XML 和 XPath 是跨平台的。
  5. 大量稳定的 XPath 和 XML/DOM 库实现。
  6. 能够使用 XSLT。
  7. 与现有的基于 XML 的管道集成XSLT+XPath -> XSD -> 执行操作 (例如,用户已在自由空间内的某个位置指定了他们的电子邮件地址和操作)文字投诉)

关于java - 使用斯坦福 Tregex 提取子树,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6624479/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com