gpt4 book ai didi

python - 训练自定义 NER 模型

转载 作者:行者123 更新时间:2023-11-30 09:03:19 25 4
gpt4 key购买 nike

我一直在一些文本上训练我的 NER 模型,并尝试使用自定义实体查找其中的城市。

示例:-

    ('paragraph Designated Offices Party A New York Party B Delaware paragraph pricing source calculation Market Value shall generally accepted pricing source reasonably agreed parties paragraph Spot rate Spot Rate specified paragraph reasonably agreed parties',
{'entities': [(37, 41, 'DesignatedBankLoc'),(54, 62, 'CounterpartyBankLoc')]})

我正在这里寻找 2 个实体 DesignatedBankLocCounterpartyBankLoc。单个文本也可以有多个实体。

目前我正在对 60 行数据进行训练,如下所示:

import spacy
import random
def train_spacy(data,iterations):
TRAIN_DATA = data
nlp = spacy.blank('en') # create blank Language class
# create the built-in pipeline components and add them to the pipeline
# nlp.create_pipe works for built-ins that are registered with spaCy
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner, last=True)


# add labels
for _, annotations in TRAIN_DATA:
for ent in annotations.get('entities'):
# print (ent[2])
ner.add_label(ent[2])

# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
optimizer = nlp.begin_training()
for itn in range(iterations):
print("Statring iteration " + str(itn))
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in TRAIN_DATA:
nlp.update(
[text], # batch of texts
[annotations], # batch of annotations
drop=0.5, # dropout - make it harder to memorise data
sgd=optimizer, # callable to update weights
losses=losses)
print(losses)
return nlp


prdnlp = train_spacy(TRAIN_DATA, 100)

我的问题是:-

当输入不同/相同的文本模式包含经过训练的城市时,模型预测正确。即使文本模式相同/不同,但模型不会预测任何实体,但不会预测训练数据集中从未出现过的不同城市。

请告诉我为什么会发生这种情况,请让我理解它是如何获得训练的概念?

最佳答案

根据经验,您有 60 行数据并训练 100 次迭代。您过度拟合了实体的值(value)而不是它们的位置。

要检查这一点,请尝试在句子中的随机位置注入(inject)城市名称,看看会发生什么。如果算法对它们进行了标记,则您可能会过度拟合。

有两种解决方案:

  • 为这些实体创建更多具有更多不同值的训练数据
  • 测试不同次数的迭代

关于python - 训练自定义 NER 模型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59151477/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com