gpt4 book ai didi

nlp - 如何将 NLP 解析树拆分为子句(独立和从属)?

转载 作者:行者123 更新时间:2023-12-04 07:18:41 24 4
gpt4 key购买 nike

给定一个 NLP 解析树,如

(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .)))

原句是“你可以说他们经常洗澡,这增加了他们的兴奋和生活乐趣。”

如何提取和逆向工程条款?
我们将在 S 和 SBAR 处拆分(以保留子句的类型,例如从属)
 - (S (NP (PRP You)) (VP (MD could) (VP (VB say) 
- (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower))
- (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to)
(NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW
de) (FW vivre))))))))))))) (. .)))

到达
 - You could say
- that they regularly catch a shower
- , which adds to their exhilaration and joie de vivre.

在 S 和 SBAR 处拆分似乎很容易。问题似乎是从片段中剥离所有 POS 标签和块。

最佳答案

您可以使用 Tree.subtrees() .更多信息请查看 NLTK Tree Class .

代码:

from nltk import Tree

parse_str = "(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .)))"
#parse_str = "(ROOT (S (SBAR (IN Though) (S (NP (PRP he)) (VP (VBD was) (ADJP (RB very) (JJ rich))))) (, ,) (NP (PRP he)) (VP (VBD was) (ADVP (RB still)) (ADJP (RB very) (JJ unhappy))) (. .)))"

t = Tree.fromstring(parse_str)
#print t

subtexts = []
for subtree in t.subtrees():
if subtree.label()=="S" or subtree.label()=="SBAR":
#print subtree.leaves()
subtexts.append(' '.join(subtree.leaves()))
#print subtexts

presubtexts = subtexts[:] # ADDED IN EDIT for leftover check

for i in reversed(range(len(subtexts)-1)):
subtexts[i] = subtexts[i][0:subtexts[i].index(subtexts[i+1])]

for text in subtexts:
print text

# ADDED IN EDIT - Not sure for generalized cases
leftover = presubtexts[0][presubtexts[0].index(presubtexts[1])+len(presubtexts[1]):]
print leftover

输出:
You could say 
that
they regularly catch a shower ,
which
adds to their exhilaration and joie de vivre
.

关于nlp - 如何将 NLP 解析树拆分为子句(独立和从属)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39320015/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com