gpt4 book ai didi

java - Stanford Dependency Parser - 如何获得跨度?

转载 作者:搜寻专家 更新时间:2023-11-01 03:25:08 24 4
gpt4 key购买 nike

我正在使用 Java 中的 Stanford 库进行依赖项解析。有什么办法可以取回我原来的依赖字符串中的索引吗?我试图调用 getSpans() 方法,但它为每个标记返回 null:

LexicalizedParser lp = LexicalizedParser.loadModel(
"edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz",
"-maxLength", "80", "-retainTmpSubcategories");
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
Tree parse = lp.apply(text);
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection<TypedDependency> tdl = gs.typedDependenciesCollapsedTree();
for(TypedDependency td:tdl)
{
td.gov().getSpan() // it's null!
td.dep().getSpan() // it's null!
}

有什么想法吗?

最佳答案

我终于编写了自己的辅助函数来获取原始字符串的跨度:

public HashMap<Integer, TokenSpan> getTokenSpans(String text, Tree parse)
{
List<String> tokens = new ArrayList<String>();
traverse(tokens, parse, parse.getChildrenAsList());
return extractTokenSpans(text, tokens);
}

private void traverse(List<String> tokens, Tree parse, List<Tree> children)
{
if(children == null)
return;
for(Tree child:children)
{
if(child.isLeaf())
{
tokens.add(child.value());
}
traverse(tokens, parse, child.getChildrenAsList());
}
}

private HashMap<Integer, TokenSpan> extractTokenSpans(String text, List<String> tokens)
{
HashMap<Integer, TokenSpan> result = new HashMap<Integer, TokenSpan>();
int spanStart, spanEnd;

int actCharIndex = 0;
int actTokenIndex = 0;
char actChar;
while(actCharIndex < text.length())
{
actChar = text.charAt(actCharIndex);
if(actChar == ' ')
{
actCharIndex++;
}
else
{
spanStart = actCharIndex;
String actToken = tokens.get(actTokenIndex);
int tokenCharIndex = 0;
while(tokenCharIndex < actToken.length() && text.charAt(actCharIndex) == actToken.charAt(tokenCharIndex))
{
tokenCharIndex++;
actCharIndex++;
}

if(tokenCharIndex != actToken.length())
{
//TODO: throw exception
}
actTokenIndex++;
spanEnd = actCharIndex;
result.put(actTokenIndex, new TokenSpan(spanStart, spanEnd));
}
}
return result;
}

那我打电话

 getTokenSpans(originalString, parse)

所以我得到了一个映射,它可以将每个标记映射到它对应的标记范围。这不是一个优雅的解决方案,但至少它有效。

关于java - Stanford Dependency Parser - 如何获得跨度?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16026881/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com