gpt4 book ai didi

spacy - PhraseMatcher 用于匹配短语之间的单词

转载 作者:行者123 更新时间:2023-12-04 10:12:17 24 4
gpt4 key购买 nike

假设我有以下两句话:"Onions are being cut. However, a great big cut to the onions have been observed" ,我希望匹配短语“切洋葱”。这只是一个最小的例子。

我的要求是该算法遍历所有句子并返回该句子是否包含该短语的 bool 值。此外,我只想匹配词形还原版本,短语之间可以有 0 个或多个单词。因此在上面的例子中,我希望它返回 [False, True] .我该怎么做呢?

我的一半尝试如下(我需要帮助的地方标记为 TODO ):

import spacy 
from spacy.matcher import PhraseMatcher

nlp = spacy.load('en_core_web_sm')
matcher = PhraseMatcher(nlp.vocab)

corpus = "onions are being cut. However, a great big cut to the onions have been observed"
pattern = "Cutting onions"
doc = nlp(corpus)
# TODO: how do I change the pattern to lemmatize and include any # of words between
matcher.add('pat1', None, pattern)

results = []
for s in doc.sents:
# TODO: can I use sentences as a doc?
matches = matcher(s)
if len(matches) > 0:
results.append(True)
else:
results.append(False)

最佳答案

我建议使用 spacy.matcher.Matcher并在获得每个匹配后获得匹配的句子。

查看示例演示:

import spacy 
from spacy.matcher import Matcher

nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab)

corpus = "onions are being cut. However, a great big cut to the onions have been observed"
doc = nlp(corpus)
pattern = [{'LEMMA': 'cut'},
{'IS_ALPHA': True, 'OP': '*'},
{'LEMMA': 'onion'}]
matcher.add('pat', None, pattern)

matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print("Match ID: {}\nString ID: {}\nStart: {}\nEnd: {}\nText: {}\nSentence: {}".format(
match_id, string_id, start, end, span.text, span.sent))

输出:
Match ID: 5387953638794962156
String ID: pat
Start: 10
End: 14
Text: cut to the onions
Sentence: However, a great big cut to the onions have been observed

请注意 pattern = [{'LEMMA': 'cut'},{'IS_ALPHA': True, 'OP': '*'},{'LEMMA': 'onion'}]模式匹配以 cut 开头的字符串引理词( {'LEMMA': 'cut'} ),然后包含任意 0 次或多次出现的任何字母词( {'IS_ALPHA': True, 'OP': '*'} ),然后有一个 onion引理词。

关于spacy - PhraseMatcher 用于匹配短语之间的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61265543/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com