gpt4 book ai didi

python - 多种语言的名称实体识别 (NER)

转载 作者:行者123 更新时间:2023-12-04 07:52:27 26 4
gpt4 key购买 nike

我正在编写一些代码来执行命名实体识别 (NER),这对于英文文本来说效果很好。但是,我希望能够将 NER 应用于任何语言。为此,我想 1) 识别文本的语言,然后 2) 将 NER 应用于识别的语言。对于第 2 步,我怀疑是 A) 将文本翻译成英文,然后应用 NER(英文),还是 B)应用识别出的语言的 NER。

下面是我目前的代码。我希望 NER 在首次识别该语言后适用于 text2 或任何其他语言:

import spacy
from spacy_langdetect import LanguageDetector
from langdetect import DetectorFactory

text = 'In 1793, Alexander Hamilton recruited Webster to move to New York City and become an editor for a Federalist Party newspaper.'
text2 = 'Em 1793, Alexander Hamilton recrutou Webster para se mudar para a cidade de Nova York e se tornar editor de um jornal do Partido Federalista.'

# Step 1: Identify the language of a text
DetectorFactory.seed = 0
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe(LanguageDetector(), name='language_detector', last=True)
doc = nlp(text)
print(doc._.language)

# Step 2: NER
Entities = [(str(x), x.label_) for x in nlp(str(text)).ents]
print(Entities)

有没有人有这方面的经验?非常感谢!

最佳答案

Spacy 需要为正确的语言加载正确的模型。

参见 https://spacy.io/usage/models可用型号。

import spacy
from langdetect import detect
nlp={}
for lang in ["en", "es", "pt", "ru"]: # Fill in the languages you want, hopefully they are supported by spacy.
if lang == "en":
nlp[lang]=spacy.load(lang + '_core_web_lg')
else:
nlp[lang]=spacy.load(lang + '_core_news_lg')

def entites(text):
lang = detect(text)
try:
nlp2 =nlp[lang]
except KeyError:
return Exception(lang + " model is not loaded")
return [(str(x), x.label_) for x in nlp2(str(text)).ents]

然后,您可以同时运行这两个步骤

ents = entites(text)
print(ents)

关于python - 多种语言的名称实体识别 (NER),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66888668/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com