gpt4 book ai didi

python - 使用自定义数据训练 Spacy 的预定义 NER 模型,需要了解复合因子、批量大小和损失值

转载 作者:太空宇宙 更新时间:2023-11-03 21:23:21 50 4
gpt4 key购买 nike

我正在尝试训练 spacy NER 模型,我有大约 2600 个段落的数据,每个段落的长度从 200 到 800 个单词不等。我必须添加两个新的实体标签:产品和规范。如果没有最好的替代方案,这种方法是否适合训练?如果可以的话,那么任何人都可以建议我复合因子和批量大小的适当值,并且在训练时,损失值应该在范围内,有什么想法吗?截至目前,我的损失值在 400-5 之间。

def main(model=None, new_model_name='product_details_parser', 
output_dir=Path('/xyz_path/'), n_iter=20):
"""Set up the pipeline and entity recognizer, and train the new
entity."""
if model is not None:
nlp = spacy.load(model) # load existing spaCy model
print("Loaded model '%s'" % model)
else:
nlp = spacy.blank('en') # create blank Language class
print("Created blank 'en' model")
# Add entity recognizer to model if it's not in the pipeline
# nlp.create_pipe works for built-ins that are registered with spaCy
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)
# otherwise, get it, so we can add labels to it
else:
ner = nlp.get_pipe('ner')
ner.add_label(LABEL) # add new entity label to entity recognizer
if model is None:
optimizer = nlp.begin_training()
else:
# Note that 'begin_training' initializes the models, so it'll zero out
# existing entity types.
optimizer = nlp.entity.create_optimizer()

# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
for itn in range(n_iter):
random.shuffle(ret_data)
losses = {}
# batch up the examples using spaCy's minibatch
batches = minibatch(ret_data, size=compounding(1., 32., 1.001))
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(texts, annotations, sgd=optimizer, drop=0.35,losses=losses)
print('Losses', losses)

if __name__ == '__main__':
plac.call(main)

最佳答案

除了这种类型的训练,您还可以从简单的训练方法开始( https://spacy.io/usage/training#training-simple-style )。与您的方法相比,这个简单的方法可能需要一些时间,但会产生更好的结果。

关于python - 使用自定义数据训练 Spacy 的预定义 NER 模型,需要了解复合因子、批量大小和损失值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54053415/

50 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com