gpt4 book ai didi

python - 如何使用 spacy nlp 查找专有名词

转载 作者:行者123 更新时间:2023-12-02 02:41:57 26 4
gpt4 key购买 nike

我正在使用 spacy 构建关键字提取器。我要查找的关键字是以下文本中的 OpTic Gaming

"The company was also one of OpTic Gaming's main sponsors during thelegendary organization's run to their first Call of Duty Championshipback in 2017"

我如何从该文本中解析 OpTic Gaming。如果使用 noun_chunks,我会得到 OpTic Gaming's main sponsors sponsors,如果我得到代币,我会得到 ["OpTic", "Gaming", "'s"].

import spacy

nlp = spacy.load("en_core_web_sm")

doc = nlp("The company was also one of OpTic Gaming's main sponsors during the legendary organization's run to their first Call of Duty Championship back in 2017")

for chunk in doc.noun_chunks:
print(chunk.text, chunk.root.text, chunk.root.dep_,
chunk.root.head.text)

The company company nsubj was

OpTic Gaming's main sponsors sponsors pobj of

their first Call Call pobj to

Duty Championship Championship pobj of

最佳答案

Spacy 为您提取词性(专有名词、行列式、动词等)。您可以使用 token.pos_

在 token 级别访问它们

在你的情况下:

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("The company was also one of OpTic Gaming's main sponsors during the legendary organization's run to their first Call of Duty Championship back in 2017")

for tok in doc:
print(tok, tok.pos_)

...

one NUM

of ADP

OpTic PROPN

Gaming PROPN

...

然后您可以过滤专有名词,对连续的专有名词进行分组,并将文档切片以获得名义组:

def extract_proper_nouns(doc):
pos = [tok.i for tok in doc if tok.pos_ == "PROPN"]
consecutives = []
current = []
for elt in pos:
if len(current) == 0:
current.append(elt)
else:
if current[-1] == elt - 1:
current.append(elt)
else:
consecutives.append(current)
current = [elt]
if len(current) != 0:
consecutives.append(current)
return [doc[consecutive[0]:consecutive[-1]+1] for consecutive in consecutives]

extract_proper_nouns(doc)

[OpTic Gaming, Duty Championship]

此处有更多详细信息:https://spacy.io/usage/linguistic-features

关于python - 如何使用 spacy nlp 查找专有名词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63450423/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com