gpt4 book ai didi

python - 如何在python中将nltk树(斯坦福)转换为newick格式?

转载 作者:太空宇宙 更新时间:2023-11-03 16:06:59 29 4
gpt4 key购买 nike

我有这棵斯坦福树,我想将其转换为 newick 格式。

    (ROOT
(S
(NP (DT A) (NN friend))
(VP
(VBZ comes)
(NP
(NP (JJ early))
(, ,)
(NP
(NP (NNS others))
(SBAR
(WHADVP (WRB when))
(S (NP (PRP they)) (VP (VBP have) (NP (NN time))))))))))

最佳答案

可能有一些方法可以仅使用字符串处理来做到这一点,但我会解析它们并递归地以 newick 格式打印它们。一个最小的实现:

import re

class Tree(object):
def __init__(self, label):
self.label = label
self.children = []

@staticmethod
def _tokenize(string):
return list(reversed(re.findall(r'\(|\)|[^ \n\t()]+', string)))

@classmethod
def from_string(cls, string):
tokens = cls._tokenize(string)
return cls._tree(tokens)

@classmethod
def _tree(cls, tokens):
t = tokens.pop()
if t == '(':
tree = cls(tokens.pop())
for subtree in cls._trees(tokens):
tree.children.append(subtree)
return tree
else:
return cls(t)

@classmethod
def _trees(cls, tokens):
while True:
if not tokens:
raise StopIteration
if tokens[-1] == ')':
tokens.pop()
raise StopIteration
yield cls._tree(tokens)

def to_newick(self):
if self.children and len(self.children) == 1:
return ','.join(child.to_newick() for child in self.children)
elif self.chilren:
return '(' + ','.join(child.to_newick() for child in self.children) + ')'
else:
return self.label

请注意,当然,信息在转换过程中会丢失,因为只保留了终端节点。用法:

>>> s = """(ROOT (..."""
>>> Tree.from_string(s).to_newick()
...

关于python - 如何在python中将nltk树(斯坦福)转换为newick格式?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39691327/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com