gpt4 book ai didi

python - Spacy Matcher - 只匹配最长的字符串

转载 作者:行者123 更新时间:2023-12-02 19:22:55 25 4
gpt4 key购买 nike

<分区>

我正在尝试使用 spacy 模式匹配器创建名词 block 。例如,如果我有一句话“冰球混战花了几个小时”。我想返回“冰球混战”和“时间”。我目前有这个:

from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")

matcher = Matcher(nlp.vocab)
matcher.add("NounChunks", None, [{"POS": "NOUN"}, {"POS": "NOUN", "OP": "*"}, {"POS": "NOUN", "OP": "*"}] )

doc = nlp("The ice hockey scrimmage took hours.")
matches = matcher(doc)

for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id]
span = doc[start:end]
print(match_id, string_id, start, end, span.text)

但它会返回所有版本的“冰球混战”,而不仅仅是最长的。

12482938965902279598 NounChunks 1 2 ice
12482938965902279598 NounChunks 1 3 ice hockey
12482938965902279598 NounChunks 2 3 hockey
12482938965902279598 NounChunks 1 4 ice hockey scrimmage
12482938965902279598 NounChunks 2 4 hockey scrimmage
12482938965902279598 NounChunks 3 4 scrimmage
12482938965902279598 NounChunks 5 6 hours

在如何定义模式方面我是否遗漏了什么?我希望它只返回:

12482938965902279598 NounChunks 1 4 ice hockey scrimmage
12482938965902279598 NounChunks 5 6 hours

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com