- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在尝试从任何格式的工资单中构建以下参数的通用提取:
我面临的挑战是由于可能会出现多种格式,我想应用NER(Spacy)来学习实体下的这些
但到目前为止我还没有成功,我什至尝试为邮政编码和日期构建自定义 EntityMatcher 但没有成功。
我寻求任何指导方针和方法,使我能够采取正确的路径来实现上述要求,即在机器学习下实现这一目标的正确和最佳方法是什么。
我尝试构建的自定义 NER 片段
import spacy
import random
import threading
import time
from DateEntityMatcher import DateEntityMatcher
from PostCodeEntityMatcher import PostCodeEntityMatcher
class IncomeValidatorModel(object):
""" Threading example class
The run() method will be started and it will run in the background
until the application exits.
"""
def __init__(self, interval=1):
""" Constructor
:type interval: int
:param interval: Check interval, in seconds
"""
self.interval = interval
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
""" Method that runs forever """
while True:
# Do something
print('Doing something important in the background')
DATA = [
(u"PRIVATE & CONFIDENTIAL REF. No. DEPT SITE PAY DATE 82521 002 31/07/2019 MR M HASAN 69 ALCOMBE ROAD NORTHAMPTON UK NN1 3LE CONFIDENTIAL PAY ADVICE MR M HASAN CAPGEMINI UK PLC EMP REFERENCE TAXDISTRICT TAXREFERENCE D83/82521 475/VB53759 TAXABLE PAY 14297.14 AY DATE 31/07/2019 TAX PERIOD 2019-04 ANN. SALARY 49650.00 TAX PAID 1611.40 PAY METHOD BACS TAX CODE 1871L PAY PERIOD MONTHLY N.I. EMPLOYEE 1365.96 N.I. NUMBER SY095026C CONTRACT HRS 40.00 PERIOD PAY 4137.50 N.I. EMPLOYER 1576.11 N.I. TABLE A O/TIME RATE 23.8702 HOURLY RATE 23.8702 PAYMENTS DEDUCTIONS DESCRIPTION HRS/UNITS RATE VALUE TO DATE DESCRIPTION VALUE BAL ANCE TO DATE BENEFIT ALLOW 620.67 706.61 NAT.INS 385.84 1365.96 DISP NT -353.08 -1253.08 P.A.Y.E. 474.80 1611.40 SALARY 4137.50 16514.38 ACCOM NT -470.77 -1670.77 GROSS PAY 4758.17 TOTAL DEDUCTIONS 860.64 NET PAY 3897.53",
{'entities': [(203, 218, 'ORG'), (100, 106, 'PERSON'), (1097, 1103, 'MONEY')]}),
(u"PRIVATE & CONFIDENTIAL REF. No. DEPT SITE PAY DATE 82521 002 31/07/2019 MR M HASAN 69 ALCOMBE ROAD NORTHAMPTON UK NN1 3LE CONFIDENTIAL PAY ADVICE MR K KHANA CAPGEMINI UK PLC EMP REFERENCE TAXDISTRICT TAXREFERENCE D83/82521 475/VB53759 TAXABLE PAY 14297.14 AY DATE 31/07/2019 TAX PERIOD 2019-04 ANN. SALARY 49650.00 TAX PAID 1611.40 PAY METHOD BACS TAX CODE 1871L PAY PERIOD MONTHLY N.I. EMPLOYEE 1365.96 N.I. NUMBER SY095026C CONTRACT HRS 40.00 PERIOD PAY 4137.50 N.I. EMPLOYER 1576.11 N.I. TABLE A O/TIME RATE 23.8702 HOURLY RATE 23.8702 PAYMENTS DEDUCTIONS DESCRIPTION HRS/UNITS RATE VALUE TO DATE DESCRIPTION VALUE BAL ANCE TO DATE BENEFIT ALLOW 620.67 706.61 NAT.INS 385.84 1365.96 DISP NT -353.08 -1253.08 P.A.Y.E. 474.80 1611.40 SALARY 4137.50 16514.38 ACCOM NT -470.77 -1670.77 GROSS PAY 4758.17 TOTAL DEDUCTIONS 860.64 NET PAY 3897.53",
{'entities': [(203, 218, 'ORG'), (100, 106, 'PERSON'), (1097, 1103, 'MONEY')]}),
(u"PRIVATE & CONFIDENTIAL REF. No. DEPT SITE PAY DATE 82521 002 31/07/2019 MR M HASAN 69 ALCOMBE ROAD NORTHAMPTON UK NN1 3LE CONFIDENTIAL PAY ADVICE MR M MENON CAPGEMINI UK PLC EMP REFERENCE TAXDISTRICT TAXREFERENCE D83/82521 475/VB53759 TAXABLE PAY 14297.14 AY DATE 31/07/2019 TAX PERIOD 2019-04 ANN. SALARY 49650.00 TAX PAID 1611.40 PAY METHOD BACS TAX CODE 1871L PAY PERIOD MONTHLY N.I. EMPLOYEE 1365.96 N.I. NUMBER SY095026C CONTRACT HRS 40.00 PERIOD PAY 4137.50 N.I. EMPLOYER 1576.11 N.I. TABLE A O/TIME RATE 23.8702 HOURLY RATE 23.8702 PAYMENTS DEDUCTIONS DESCRIPTION HRS/UNITS RATE VALUE TO DATE DESCRIPTION VALUE BAL ANCE TO DATE BENEFIT ALLOW 620.67 706.61 NAT.INS 385.84 1365.96 DISP NT -353.08 -1253.08 P.A.Y.E. 474.80 1611.40 SALARY 4137.50 16514.38 ACCOM NT -470.77 -1670.77 GROSS PAY 4758.17 TOTAL DEDUCTIONS 860.64 NET PAY 3897.53",
{'entities': [(203, 218, 'ORG'), (100, 106, 'PERSON'), (1097, 1103, 'MONEY')]}),
(u"PRIVATE & CONFIDENTIAL REF. No. DEPT SITE PAY DATE 82521 002 31/07/2019 MR M HASAN 69 ALCOMBE ROAD NORTHAMPTON UK NN1 3LE CONFIDENTIAL PAY ADVICE MR F JAHAN CAPGEMINI UK PLC EMP REFERENCE TAXDISTRICT TAXREFERENCE D83/82521 475/VB53759 TAXABLE PAY 14297.14 AY DATE 31/07/2019 TAX PERIOD 2019-04 ANN. SALARY 49650.00 TAX PAID 1611.40 PAY METHOD BACS TAX CODE 1871L PAY PERIOD MONTHLY N.I. EMPLOYEE 1365.96 N.I. NUMBER SY095026C CONTRACT HRS 40.00 PERIOD PAY 4137.50 N.I. EMPLOYER 1576.11 N.I. TABLE A O/TIME RATE 23.8702 HOURLY RATE 23.8702 PAYMENTS DEDUCTIONS DESCRIPTION HRS/UNITS RATE VALUE TO DATE DESCRIPTION VALUE BAL ANCE TO DATE BENEFIT ALLOW 620.67 706.61 NAT.INS 385.84 1365.96 DISP NT -353.08 -1253.08 P.A.Y.E. 474.80 1611.40 SALARY 4137.50 16514.38 ACCOM NT -470.77 -1670.77 GROSS PAY 4758.17 TOTAL DEDUCTIONS 860.64 NET PAY 3897.53",
{'entities': [(203, 218, 'ORG'), (100, 106, 'PERSON'), (1097, 1103, 'MONEY')]}),
(u"PRIVATE & CONFIDENTIAL REF. No. DEPT SITE PAY DATE 82521 002 31/07/2019 MR M HASAN 69 ALCOMBE ROAD NORTHAMPTON UK NN1 3LE CONFIDENTIAL PAY ADVICE MR A JAHAN CAPGEMINI UK PLC EMP REFERENCE TAXDISTRICT TAXREFERENCE D83/82521 475/VB53759 TAXABLE PAY 14297.14 AY DATE 31/07/2019 TAX PERIOD 2019-04 ANN. SALARY 49650.00 TAX PAID 1611.40 PAY METHOD BACS TAX CODE 1871L PAY PERIOD MONTHLY N.I. EMPLOYEE 1365.96 N.I. NUMBER SY095026C CONTRACT HRS 40.00 PERIOD PAY 4137.50 N.I. EMPLOYER 1576.11 N.I. TABLE A O/TIME RATE 23.8702 HOURLY RATE 23.8702 PAYMENTS DEDUCTIONS DESCRIPTION HRS/UNITS RATE VALUE TO DATE DESCRIPTION VALUE BAL ANCE TO DATE BENEFIT ALLOW 620.67 706.61 NAT.INS 385.84 1365.96 DISP NT -353.08 -1253.08 P.A.Y.E. 474.80 1611.40 SALARY 4137.50 16514.38 ACCOM NT -470.77 -1670.77 GROSS PAY 4758.17 TOTAL DEDUCTIONS 860.64 NET PAY 3897.53",
{'entities': [(203, 218, 'ORG'), (100, 106, 'PERSON'), (1097, 1103, 'MONEY')]}),
(u"PRIVATE & CONFIDENTIAL REF. No. DEPT SITE PAY DATE 82521 002 31/07/2019 MR M HASAN 69 ALCOMBE ROAD NORTHAMPTON UK NN1 3LE CONFIDENTIAL PAY ADVICE MR M HASAN CAPGEMINI UK PLC EMP REFERENCE TAXDISTRICT TAXREFERENCE D83/82521 475/VB53759 TAXABLE PAY 14297.14 AY DATE 31/07/2019 TAX PERIOD 2019-04 ANN. SALARY 49650.00 TAX PAID 1611.40 PAY METHOD BACS TAX CODE 1871L PAY PERIOD MONTHLY N.I. EMPLOYEE 1365.96 N.I. NUMBER SY095026C CONTRACT HRS 40.00 PERIOD PAY 4137.50 N.I. EMPLOYER 1576.11 N.I. TABLE A O/TIME RATE 23.8702 HOURLY RATE 23.8702 PAYMENTS DEDUCTIONS DESCRIPTION HRS/UNITS RATE VALUE TO DATE DESCRIPTION VALUE BAL ANCE TO DATE BENEFIT ALLOW 620.67 706.61 NAT.INS 385.84 1365.96 DISP NT -353.08 -1253.08 P.A.Y.E. 474.80 1611.40 SALARY 4137.50 16514.38 ACCOM NT -470.77 -1670.77 GROSS PAY 4758.17 TOTAL DEDUCTIONS 860.64 NET PAY 3897.53",
{'entities': [(203, 218, 'ORG'), (100, 106, 'PERSON'), (1097, 1103, 'MONEY')]}),
(u"PRIVATE & CONFIDENTIAL REF. No. DEPT SITE PAY DATE 82521 002 31/07/2019 MR M HASAN 69 ALCOMBE ROAD NORTHAMPTON UK NN1 3LE CONFIDENTIAL PAY ADVICE MR M HASAN CAPGEMINI UK PLC EMP REFERENCE TAXDISTRICT TAXREFERENCE D83/82521 475/VB53759 TAXABLE PAY 14297.14 AY DATE 31/07/2019 TAX PERIOD 2019-04 ANN. SALARY 49650.00 TAX PAID 1611.40 PAY METHOD BACS TAX CODE 1871L PAY PERIOD MONTHLY N.I. EMPLOYEE 1365.96 N.I. NUMBER SY095026C CONTRACT HRS 40.00 PERIOD PAY 4137.50 N.I. EMPLOYER 1576.11 N.I. TABLE A O/TIME RATE 23.8702 HOURLY RATE 23.8702 PAYMENTS DEDUCTIONS DESCRIPTION HRS/UNITS RATE VALUE TO DATE DESCRIPTION VALUE BAL ANCE TO DATE BENEFIT ALLOW 620.67 706.61 NAT.INS 385.84 1365.96 DISP NT -353.08 -1253.08 P.A.Y.E. 474.80 1611.40 SALARY 4137.50 16514.38 ACCOM NT -470.77 -1670.77 GROSS PAY 4758.17 TOTAL DEDUCTIONS 860.64 NET PAY 3897.53",
{'entities': [(203, 218, 'ORG'), (100, 106, 'PERSON'), (1097, 1103, 'MONEY')]}),
(u"PRIVATE & CONFIDENTIAL REF. No. DEPT SITE PAY DATE 82521 002 31/07/2019 MR M HASAN 69 ALCOMBE ROAD NORTHAMPTON UK NN1 3LE CONFIDENTIAL PAY ADVICE MR M HASAN CAPGEMINI UK PLC EMP REFERENCE TAXDISTRICT TAXREFERENCE D83/82521 475/VB53759 TAXABLE PAY 14297.14 AY DATE 31/07/2019 TAX PERIOD 2019-04 ANN. SALARY 49650.00 TAX PAID 1611.40 PAY METHOD BACS TAX CODE 1871L PAY PERIOD MONTHLY N.I. EMPLOYEE 1365.96 N.I. NUMBER SY095026C CONTRACT HRS 40.00 PERIOD PAY 4137.50 N.I. EMPLOYER 1576.11 N.I. TABLE A O/TIME RATE 23.8702 HOURLY RATE 23.8702 PAYMENTS DEDUCTIONS DESCRIPTION HRS/UNITS RATE VALUE TO DATE DESCRIPTION VALUE BAL ANCE TO DATE BENEFIT ALLOW 620.67 706.61 NAT.INS 385.84 1365.96 DISP NT -353.08 -1253.08 P.A.Y.E. 474.80 1611.40 SALARY 4137.50 16514.38 ACCOM NT -470.77 -1670.77 GROSS PAY 4758.17 TOTAL DEDUCTIONS 860.64 NET PAY 3897.53",
{'entities': [(203, 218, 'ORG'), (100, 106, 'PERSON'), (1097, 1103, 'MONEY')]}),
(u"Sample Payslip Matrix House Basing View Basingstoke Hampshire RG21 4FF Advantage Resourcing 6th Floor, Matrix House, Basing View, Basingstoke, Hampshire, RG21 4FF Registered Number 03341461 COMPANY DIVISION Advantage Resourcing UK SWINDON WORKER NO. NAME PERIOD PAY DATE IND 123456 Sample Payslip 14/2016 08/07/2016 W1 DEPARTMENT TAX CODE N.I. NO./TABLE LETTER NAT 1100L JA123456A/A PAYMENTS DEDUCTIONS Wk Ending Timesheet Description Units Rate Amount Deduction Amount 03/07/2016 GEN000499628 Hourly Rate 40.00 10.00 400.00 Tax 87.60 03/07/2016 GEN000499628 Week Day Overtime 10.00 15.00 150.00 NI 59.40 03/07/2016 GEN000499628 Saturday Overtime 5.00 20.00 100.00 TOTAL PAYMENTS 650.00 TOTAL DEDUCTIONS 147.00 CUMULATIVES GROSS TO DATE 650.00 Current Holiday Entitlement: 0.00 Unit(s) TAXABLE PAY TO DATE 650.00 EE PENSION TO DATE 0.00 ER PENSION TO DATE 0.00 TAX TO DATE 87.60 TO DATE 68.17 TO DATE 59.40 c Safe Computing Limited 2002 NET PAY 503.00",
{'entities': [(89, 109, 'ORG'), (0, 14, 'PERSON'), (1186, 1191, 'MONEY')]}),
(u"Mubssar Hasan Matrix House Basing View Basingstoke Hampshire RG21 4FF Advantage Resourcing 6th Floor, Matrix House, Basing View, Basingstoke, Hampshire, RG21 4FF Registered Number 03341461 COMPANY DIVISION Advantage Resourcing UK SWINDON WORKER NO. NAME PERIOD PAY DATE IND 123456 Sample Payslip 14/2016 08/07/2016 W1 DEPARTMENT TAX CODE N.I. NO./TABLE LETTER NAT 1100L JA123456A/A PAYMENTS DEDUCTIONS Wk Ending Timesheet Description Units Rate Amount Deduction Amount 03/07/2016 GEN000499628 Hourly Rate 40.00 10.00 400.00 Tax 87.60 03/07/2016 GEN000499628 Week Day Overtime 10.00 15.00 150.00 NI 59.40 03/07/2016 GEN000499628 Saturday Overtime 5.00 20.00 100.00 TOTAL PAYMENTS 650.00 TOTAL DEDUCTIONS 147.00 CUMULATIVES GROSS TO DATE 650.00 Current Holiday Entitlement: 0.00 Unit(s) TAXABLE PAY TO DATE 650.00 EE PENSION TO DATE 0.00 ER PENSION TO DATE 0.00 TAX TO DATE 87.60 TO DATE 68.17 TO DATE 59.40 c Safe Computing Limited 2002 NET PAY 503.00",
{'entities': [(88, 108, 'ORG'), (0, 13, 'PERSON'), (1186, 1191, 'MONEY')]}),
(u"Oracle Corp Anil Menon Work Date 01/09/2019 PAYMENTS Tax 100 Net Pay 2000",
{'entities': [(0, 10, 'ORG'), (12, 21, 'PERSON'), (69, 72, 'MONEY')]}),
(u"Huawei Corp Anil Menon Work Date 01/06/2019 PAYMENTS Tax 100 Net Pay 1900",
{'entities': [(0, 10, 'ORG'), (12, 21, 'PERSON'), (69, 72, 'MONEY')]}),
(u"Tata Corp Nitin Garg Work Date 20/04/2019 PAYMENTS Tax 100 Net Pay 1900",
{'entities': [(0, 8, 'ORG'), (10, 19, 'PERSON'), (67, 70, 'MONEY')]}),
(u"Accenture Corp Amol Joshi Work Date 20/04/2019 PAYMENTS Tax 100 Net Pay 900",
{'entities': [(0, 15, 'ORG'), (17, 26, 'PERSON'), (72, 74, 'MONEY')]}),
(u"Cognizant Corp Anup Nair Work Date 20/04/2019 PAYMENTS Tax 100 Net Pay 900",
{'entities': [(0, 15, 'ORG'), (17, 25, 'PERSON'), (71, 73, 'MONEY')]}),
(u"Cognizant Corp Sajit Kumar Work Date 20/04/2019 PAYMENTS Tax 100 Net Pay 1900",
{'entities': [(0, 15, 'ORG'), (17, 27, 'PERSON'), (73, 76, 'MONEY')]}),
(u"Tata Corp Saurabh Dave Work Date 20/04/2019 PAYMENTS Tax 100 Net Pay 1300",
{'entities': [(0, 8, 'ORG'), (10, 21, 'PERSON'), (69, 72, 'MONEY')]}),
(u"Capgemini PLC Mubashshir Hasan Work Date 20/04/2019 PAYMENTS Tax 100 Net Pay 1700",
{'entities': [(0, 12, 'ORG'), (14, 29, 'PERSON'), (77, 80, 'MONEY')]}),
(u"Capgemini PLC Sagar Pande Work Date 20/04/2019 PAYMENTS Tax 100 Net Pay 1700",
{'entities': [(0, 12, 'ORG'), (14, 24, 'PERSON'), (72, 75, 'MONEY')]}),
(u"Capgemini PLC Sreeram Yegappan Work Date 20/04/2019 PAYMENTS Tax 100 Net Pay 2000",
{'entities': [(0, 12, 'ORG'), (14, 29, 'PERSON'), (77, 80, 'MONEY')]})
]
# nlp = spacy.blank('en') # new, empty model. Let’s say it’s for the English language
global nlp
nlp = spacy.load('en_core_web_sm')
nlp.entity.add_label('ORG')
nlp.entity.add_label('PERSON')
nlp.entity.add_label('MONEY')
# add NER pipeline
# ner = nlp.create_pipe('ner') # our pipeline would just do NER
# nlp.add_pipe(ner, last=True) # we add the pipeline to the model
postcde_entity_matcher = PostCodeEntityMatcher(nlp, ['NN1 3LE', 'NN2 8HF', 'IG3 8TH', 'NN4 7YH', 'RG21 5GH'], 'POSTCDE')
nlp.entity.add_label('POSTCDE')
nlp.add_pipe(postcde_entity_matcher, before='ner')
date_entity_matcher = DateEntityMatcher(nlp, ['20/04/2019','20/04/2019', '25/04/2016', '20/04/2019', '20/07/2019', '20/12/2019'], 'DATE')
nlp.entity.add_label('DATE')
nlp.add_pipe(date_entity_matcher, before='ner')
optimizer = nlp.begin_training()
for i in range(11):
random.shuffle(DATA)
for text, annotations in DATA:
nlp.update([text], [annotations], sgd=optimizer)
time.sleep(self.interval)
def extractPayslipData(self, data):
doc = nlp(data)
for entity in doc.ents:
print(entity.label_, ' | ', entity.text)
return doc.ents
最佳答案
训练 json(x.json) 应该是这样的:-
[{
"text": "PRIVATE & CONFIDENTIAL REF. No. DEPT SITE PAY DATE 82521 002 31/07/2019 MR M HASAN 69 ALCOMBE ROAD NORTHAMPTON UK NN1 3LE CONFIDENTIAL PAY ADVICE MR M HASAN CAPGEMINI UK PLC EMP REFERENCE TAXDISTRICT TAXREFERENCE D83/82521 475/VB53759 TAXABLE PAY 14297.14 AY DATE 31/07/2019 TAX PERIOD 2019-04 ANN. SALARY 49650.00 TAX PAID 1611.40 PAY METHOD BACS TAX CODE 1871L PAY PERIOD MONTHLY N.I. EMPLOYEE 1365.96 N.I. NUMBER SY095026C CONTRACT HRS 40.00 PERIOD PAY 4137.50 N.I. EMPLOYER 1576.11 N.I. TABLE A O/TIME RATE 23.8702 HOURLY RATE 23.8702 PAYMENTS DEDUCTIONS DESCRIPTION HRS/UNITS RATE VALUE TO DATE DESCRIPTION VALUE BAL ANCE TO DATE BENEFIT ALLOW 620.67 706.61 NAT.INS 385.84 1365.96 DISP NT -353.08 -1253.08 P.A.Y.E. 474.80 1611.40 SALARY 4137.50 16514.38 ACCOM NT -470.77 -1670.77 GROSS PAY 4758.17 TOTAL DEDUCTIONS 860.64 NET PAY 3897.53",
"entities": [
[
191,
198,
"PERSON"
],
[
202,
211,
"ORG"
],
[
150,
157,
"POST_CODE"
],
[
1096,
1103,
"MONEY"
]]
},
{
"text": "Mubssar Hasan Matrix House Basing View Basingstoke Hampshire RG21 4FF Advantage Resourcing 6th Floor, Matrix House, Basing View, Basingstoke, Hampshire, RG21 4FF Registered Number 03341461 COMPANY DIVISION Advantage Resourcing UK SWINDON WORKER NO. NAME PERIOD PAY DATE IND 123456 Sample Payslip 14/2016 08/07/2016 W1 DEPARTMENT TAX CODE N.I. NO./TABLE LETTER NAT 1100L JA123456A/A PAYMENTS DEDUCTIONS Wk Ending Timesheet Description Units Rate Amount Deduction Amount 03/07/2016 GEN000499628 Hourly Rate 40.00 10.00 400.00 Tax 87.60 03/07/2016 GEN000499628 Week Day Overtime 10.00 15.00 150.00 NI 59.40 03/07/2016 GEN000499628 Saturday Overtime 5.00 20.00 100.00 TOTAL PAYMENTS 650.00 TOTAL DEDUCTIONS 147.00 CUMULATIVES GROSS TO DATE 650.00 Current Holiday Entitlement: 0.00 Unit(s) TAXABLE PAY TO DATE 650.00 EE PENSION TO DATE 0.00 ER PENSION TO DATE 0.00 TAX TO DATE 87.60 TO DATE 68.17 TO DATE 59.40 c Safe Computing Limited 2002 NET PAY 503.00",
"entities": [
[
1,
13,
"PERSON"
],
[
88,
108,
"ORG"
],
[
150,
157,
"POST_CODE"
],
[
1186,
1192,
"MONEY"
]]
}
]
代码:-
with open(training_pickel_file) as input:
TRAIN_DATA = json.load(input)
def main(model=None, output_dir="/home/NLP/model", n_iter=50):
if model is not None:
nlp = spacy.load(model)
print("Loaded model '%s'" % model)
else:
nlp = spacy.blank('en') # create blank Language class
print("Created blank 'en' model")
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner, last=True)
# otherwise, get it so we can add labels
else:
ner = nlp.get_pipe('ner')
for annotations in TRAIN_DATA:
for ent in annotations["entities"]:
ner.add_label(ent[2])
print(ner)
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
optimizer = nlp.begin_training()
for itn in range(n_iter):
random.shuffle(TRAIN_DATA)
losses = {}
for a in TRAIN_DATA:`
doc = nlp.make_doc(a["text"])
gold = GoldParse(doc, entities = a["entities"])
nlp.update([doc], [gold], drop =0.5, sgd=optimizer, losses = losses)
print('Losses', losses)
if output_dir is not None:
output_dir = Path(output_dir)
if not output_dir.exists():
output_dir.mkdir()
nlp.to_disk(output_dir)
模型测试:-
sen = ["""PRIVATE & CONFIDENTIAL REF. No. DEPT SITE PAY DATE 82521 002 31/07/2019 MR M HASAN 69 ALCOMBE ROAD NORTHAMPTON UK NN1 3LE CONFIDENTIAL PAY ADVICE MR M HASAN CAPGEMINI UK PLC EMP REFERENCE COME A TAXDISTRICT TAXREFERENCE D83/82521 475/VB53759 TAXABLE PAY 14297.14 AY DATE 31/07/2019 TAX PERIOD 2019-04 ANN. SALARY 49650.00 TAX PAID 1611.40 PAY METHOD BACS TAX CODE 1871L PAY PERIOD MONTHLY N.I. EMPLOYEE 1365.96 N.I. NUMBER SY095026C CONTRACT HRS 40.00 PERIOD PAY 4137.50 N.I. EMPLOYER 1576.11 N.I. TABLE A O/TIME RATE 23.8702 HOURLY RATE 23.8702 PAYMENTS DEDUCTIONS DESCRIPTION HRS/UNITS RATE VALUE TO DATE DESCRIPTION VALUE BAL ANCE TO DATE BENEFIT ALLOW 620.67 706.61 NAT.INS 385.84 1365.96 DISP NT -353.08 -1253.08 P.A.Y.E. 474.80 1611.40 SALARY 4137.50 16514.38 ACCOM NT -470.77 -1670.77 GROSS PAY 4758.17 TOTAL DEDUCTIONS 860.64 NET PAY 3897.53"""]
for text in sen:
doc = nlp(text)
entity = {}
for ent in doc.ents:
list_of_ent = []
list_of_ent.append(ent.text)
entity.update({ent.label_: list_of_ent})
print(entity)
关于python-3.x - 需要构建自定义 NER 的方法,以便从任何格式的工资单中提取以下关键字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57904642/
有没有办法使用 Clojure format(基于 java.util.Formatter)或 cl-format(基于 Common Lisp 的format) 以编程方式设置空格填充?如果您事先知
我正在尝试创建一个用户实体以及数据/文件(pdf格式)。上传并保存到数据库很好,但是当我让用户进入 postman 时尝试发送获取请求方法,然后在数据字段中显示一些糟糕的数据,而且我无法在数据库中看到
我必须将值为 {"STX","ETX"} 的普通字符串数组转换为十六进制值,并且我应该根据 http://www.asciitable.com/ 得到 {2,3} . 最佳答案 听起来你想要一个 Ma
我想格式化我的代码,但不确定哪种格式类型最适合我的项目需要。 我发现仅对于 dart 和 flutter 项目(我都有),有不止一个选项可用于格式化编程语言/框架中预先构建的代码。 Dart : da
我已经尝试了多个代码,例如这样 Sub DateFixer() Application.ScreenUpdating = False Application.Calculation =
SolrQuery query = new SolrQuery(); query.setQuery("*:*"); query.add("wt","csv"); server.query(query)
我有一个包含多个字符串的数据库,我从查询中获取了这些记录,并且我在 QString 中收到了这种格式的数据: "Mon, 13 Nov 2017 09:48:45 +0000" 所以,我需要根据文化来
我有一个 Delphi 2007 DBGrid,我想让用户以更新的 Excel 格式 (OOXML) 保存它,但我的标准是用户不需要安装 Excel。有没有人知道任何已经这样做的组件?是的,我已经搜索
我正在我们的普通 html 站点旁边创建一个移动站点。使用 rails 3.1。移动站点在子域 m.site.com 中访问。 我已经定义了移动格式(Mime::Type.register_alias
我正在尝试使用 xmlstarlet 格式化 xml 文件,但我不想创建新的 xml 文件。 我试过了 xmlstarlet fo --inplace --indent-tab --omit-decl
我在 A 列中有一个带有文本的电子表格。 例如 A1=MY TEXT1 A2=MY TEXT2 A3=MY TEXT3 A4=MY TEXT4 A5=MY TEXT5 我想在文本的前后添加撇号 结果是
我想做一些源代码转换(自动导入列表清理),我想保留注释和格式。我听说过一些关于解析器这样做的事情,我认为是 ghc 解析器。 看起来我可以通过从文件中提取内容来使用 hs-src-exts Langu
我在 Excel 中工作,我想根据另一张表中的列表找出一张表中是否有匹配项。 我已将值粘贴到列表中,并希望从另一张表中返回它们的相应值。包含字母和数字的单元格可以正常工作(例如:D5765000),但
我有一个 DurationField在我的模型中定义为 day0 = models.DurationField('Duration for Monday', default=datetime.time
我正在为我的应用程序开发 WMI 查询。它需要为给定的 VID/PID 找到分配的虚拟 COM 端口。使用 WMI Code Creator 我发现...... 命名空间:root\CIMV2 类:W
我试图弄清楚如何使用 NSTextList,但除了 this SO question 之外,在网上几乎没有找到有用的信息。和 the comment in this blog . 使用这个我已经能够创
我要查询all_objects表在哪里last_ddl_time='01 jan 2010'但它拒绝日期格式... 任何机构给我查询的确切格式? 最佳答案 正如 AKF 所说,您应该使用 Trunc除
我试图在我的应用程序中实现聊天功能。我使用了 2 个 JEditorPane。一个用于保存聊天记录,另一个用于将聊天发送到前一个 JEditorPane。 JEditorPane 是 text/h
我在大学里修了一个编译器类(class),内容非常丰富,很有趣,尽管也很多工作。既然给了我们要实现的语言规范,所以我学不到的一件事就是语言设计。我现在正在考虑创建一种有趣的简单玩具语言,以便我可以玩耍
Closed. This question does not meet Stack Overflow guidelines。它当前不接受答案。 想改善这个问题吗?更新问题,以便将其作为on-topic
我是一名优秀的程序员,十分优秀!