gpt4 book ai didi

python - 如何从依赖解析器的输出制作一棵树?

转载 作者:行者123 更新时间:2023-12-01 14:46:19 27 4
gpt4 key购买 nike

我正在尝试从依赖解析器的输出中制作一棵树(嵌套字典)。这句话是“我在睡梦中射杀了一头大象”。我能够获得链接中描述的输出:
How do I do dependency parsing in NLTK?

nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)

为了将此元组列表转换为嵌套字典,我使用了以下链接:
How to convert python list of tuples into tree?
def build_tree(list_of_tuples):
all_nodes = {n[2]:((n[0], n[1]),{}) for n in list_of_tuples}
root = {}
print all_nodes
for item in list_of_tuples:
rel, gov,dep = item
if gov is not 'ROOT':
all_nodes[gov][1][dep] = all_nodes[dep]
else:
root[dep] = all_nodes[dep]
return root

这给出了如下输出:
{'shot': (('ROOT', 'ROOT'),
{'I': (('nsubj', 'shot'), {}),
'elephant': (('dobj', 'shot'), {'an': (('det', 'elephant'), {})}),
'sleep': (('nmod', 'shot'),
{'in': (('case', 'sleep'), {}), 'my': (('nmod:poss', 'sleep'), {})})})}

为了找到根到叶的路径,我使用了以下链接: Return root to specific leaf from a nested dictionary tree

[制作树和找到路径是两个独立的事情]第二个目标是找到根到叶节点的路径,就像做的 Return root to specific leaf from a nested dictionary tree .
但是我想获取root-to-leaf(依赖关系路径)
因此,例如,当我调用 recurse_category(categories, 'an') 时,类别是嵌套的树结构,而 'an' 是树中的单词,我应该得到 ROOT-nsubj-dobj (直到根的依赖关系)作为输出。

最佳答案

首先,如果你只是使用斯坦福 CoreNLP 依赖解析器的预训练模型,你应该使用 CoreNLPDependencyParser来自 nltk.parse.corenlp并避免使用旧的 nltk.parse.stanford界面。

Stanford Parser and NLTK

在终端中下载并运行 Java 服务器后,在 Python 中:

>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> sent = "I shot an elephant with a banana .".split()
>>> parses = list(dep_parser.parse(sent))
>>> type(parses[0])
<class 'nltk.parse.dependencygraph.DependencyGraph'>

现在我们看到解析的类型是 DependencyGraph来自 nltk.parse.dependencygraph https://github.com/nltk/nltk/blob/develop/nltk/parse/dependencygraph.py#L36

转换 DependencyGraphnltk.tree.Tree简单地做 DependencyGraph.tree() :
>>> parses[0].tree()
Tree('shot', ['I', Tree('elephant', ['an']), Tree('banana', ['with', 'a']), '.'])

>>> parses[0].tree().pretty_print()
shot
_________|____________
| | elephant banana
| | | _____|_____
I . an with a

要将其转换为括号内的解析格式:
>>> print(parses[0].tree())
(shot I (elephant an) (banana with a) .)

如果您正在寻找依赖三胞胎:
>>> [(governor, dep, dependent) for governor, dep, dependent in parses[0].triples()]
[(('shot', 'VBD'), 'nsubj', ('I', 'PRP')), (('shot', 'VBD'), 'dobj', ('elephant', 'NN')), (('elephant', 'NN'), 'det', ('an', 'DT')), (('shot', 'VBD'), 'nmod', ('banana', 'NN')), (('banana', 'NN'), 'case', ('with', 'IN')), (('banana', 'NN'), 'det', ('a', 'DT')), (('shot', 'VBD'), 'punct', ('.', '.'))]

>>> for governor, dep, dependent in parses[0].triples():
... print(governor, dep, dependent)
...
('shot', 'VBD') nsubj ('I', 'PRP')
('shot', 'VBD') dobj ('elephant', 'NN')
('elephant', 'NN') det ('an', 'DT')
('shot', 'VBD') nmod ('banana', 'NN')
('banana', 'NN') case ('with', 'IN')
('banana', 'NN') det ('a', 'DT')
('shot', 'VBD') punct ('.', '.')

CONLL 格式:
>>> print(parses[0].to_conll(style=10))
1 I I PRP PRP _ 2 nsubj _ _
2 shot shoot VBD VBD _ 0 ROOT _ _
3 an a DT DT _ 4 det _ _
4 elephant elephant NN NN _ 2 dobj _ _
5 with with IN IN _ 7 case _ _
6 a a DT DT _ 7 det _ _
7 banana banana NN NN _ 2 nmod _ _
8 . . . . _ 2 punct _ _

关于python - 如何从依赖解析器的输出制作一棵树?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52148690/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com