gpt4 book ai didi

python - 使用 Spacy 提取动词短语

转载 作者:太空狗 更新时间:2023-10-29 18:04:22 26 4
gpt4 key购买 nike

我一直在使用 Spacy 使用 Spacy 提供的 Doc.noun_chunks 属性来提取名词 block 。我如何使用 Spacy 库(形式为 'VERB ? ADV * VERB +' )从输入文本中提取动词短语?

最佳答案

这可能对你有帮助。

from __future__ import unicode_literals
import spacy,en_core_web_sm
import textacy
nlp = en_core_web_sm.load()
sentence = 'The author is writing a new book.'
pattern = r'<VERB>?<ADV>*<VERB>+'
doc = textacy.Doc(sentence, lang='en_core_web_sm')
lists = textacy.extract.pos_regex_matches(doc, pattern)
for list in lists:
print(list.text)

输出:

is writing

关于如何突出显示动词短语,请查看下面的链接。

Highlight verb phrases using spacy and html

另一种方法:

最近观察到Textacy对regex匹配做了一些改动。基于这种方法,我尝试了这种方式。

from __future__ import unicode_literals
import spacy,en_core_web_sm
import textacy
nlp = en_core_web_sm.load()
sentence = 'The cat sat on the mat. He dog jumped into the water. The author is writing a book.'
pattern = [{'POS': 'VERB', 'OP': '?'},
{'POS': 'ADV', 'OP': '*'},
{'POS': 'VERB', 'OP': '+'}]
doc = textacy.make_spacy_doc(sentence, lang='en_core_web_sm')
lists = textacy.extract.matches(doc, pattern)
for list in lists:
print(list.text)

输出:

sat
jumped
writing

我检查了此链接中的 POS 匹配,结果似乎不是预期的结果。

[ https://explosion.ai/demos/matcher][1]

有没有人尝试过构建 POS 标签而不是 Regexp 模式来查找动词短语?

编辑 2:

import spacy   
from spacy.matcher import Matcher
from spacy.util import filter_spans

nlp = spacy.load('en_core_web_sm')

sentence = 'The cat sat on the mat. He quickly ran to the market. The dog jumped into the water. The author is writing a book.'
pattern = [{'POS': 'VERB', 'OP': '?'},
{'POS': 'ADV', 'OP': '*'},
{'POS': 'AUX', 'OP': '*'},
{'POS': 'VERB', 'OP': '+'}]

# instantiate a Matcher instance
matcher = Matcher(nlp.vocab)
matcher.add("Verb phrase", None, pattern)

doc = nlp(sentence)
# call the matcher to find matches
matches = matcher(doc)
spans = [doc[start:end] for _, start, end in matches]

print (filter_spans(spans))

输出:

[sat, quickly ran, jumped, is writing]

基于 mdmjsh 回答的帮助。

Edit3:奇怪的行为。以下模式的以下句子动词短语在 https://explosion.ai/demos/matcher 中得到正确识别

pattern = [{'POS': 'VERB', 'OP': '?'},
{'POS': 'ADV', 'OP': '*'},
{'POS': 'VERB', 'OP': '+'}]

那只黑猫一定在院子里喵喵叫真的很响。

但是从代码运行时输出以下内容。

[必须,真的喵喵叫]

关于python - 使用 Spacy 提取动词短语,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47856247/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com