gpt4 book ai didi

python - 使用 Stanza 和 CoreNLPClient 提取名词短语

转载 作者:行者123 更新时间:2023-12-04 16:40:40 30 4
gpt4 key购买 nike

我正在尝试使用 Stanza(使用斯坦福 CoreNLP)从句子中提取名词短语。这只能通过 Stanza 中的 CoreNLPClient 模块来完成。

# Import client module
from stanza.server import CoreNLPClient
# Construct a CoreNLPClient with some basic annotators, a memory allocation of 4GB, and port number 9001
client = CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner', 'parse'], memory='4G', endpoint='http://localhost:9001')

这是一个句子的例子,我使用的是 tregrex函数在客户端获取所有名词短语。 Tregex函数返回 dict of dicts在 python 中。因此我需要处理 tregrex 的输出在将其传递给 Tree.fromstring 之前NLTK 中的函数以将名词短语正确提取为字符串。
pattern = 'NP'
text = "Albert Einstein was a German-born theoretical physicist. He developed the theory of relativity."
matches = client.tregrex(text, pattern) ``

因此,我想出了方法 stanza_phrases必须循环遍历 dict of dicts这是 tregrex 的输出并正确格式化 Tree.fromstring在 NLTK 中。
def stanza_phrases(matches):
Nps = []
for match in matches:
for items in matches['sentences']:
for keys,values in items.items():
s = '(ROOT\n'+ values['match']+')'
Nps.extend(extract_phrase(s, pattern))
return set(Nps)

生成由 NLTK 使用的树
from nltk.tree import Tree
def extract_phrase(tree_str, label):
phrases = []
trees = Tree.fromstring(tree_str)
for tree in trees:
for subtree in tree.subtrees():
if subtree.label() == label:
t = subtree
t = ' '.join(t.leaves())
phrases.append(t)

return phrases

这是我的输出:
{'Albert Einstein', 'He', 'a German-born theoretical physicist', 'relativity',  'the theory', 'the theory of relativity'}

有没有办法可以用更少的行数来提高代码效率(尤其是 stanza_phrasesextract_phrase 方法)

最佳答案

from stanza.server import CoreNLPClient

# get noun phrases with tregex
def noun_phrases(_client, _text, _annotators=None):
pattern = 'NP'
matches = _client.tregex(_text,pattern,annotators=_annotators)
print("\n".join(["\t"+sentence[match_id]['spanString'] for sentence in matches['sentences'] for match_id in sentence]))

# English example
with CoreNLPClient(timeout=30000, memory='16G') as client:
englishText = "Albert Einstein was a German-born theoretical physicist. He developed the theory of relativity."
print('---')
print(englishText)
noun_phrases(client,englishText,_annotators="tokenize,ssplit,pos,lemma,parse")

# French example
with CoreNLPClient(properties='french', timeout=30000, memory='16G') as client:
frenchText = "Je suis John."
print('---')
print(frenchText)
noun_phrases(client,frenchText,_annotators="tokenize,ssplit,mwt,pos,lemma,parse")

关于python - 使用 Stanza 和 CoreNLPClient 提取名词短语,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61633485/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com