gpt4 book ai didi

python - 使用Python提取医疗信息

转载 作者:太空狗 更新时间:2023-10-29 17:27:10 24 4
gpt4 key购买 nike

我是一名护士,我知道 python 但我不是专家,只是用它来处理 DNA 序列
我们得到了用人类语言编写的医院记录,我应该将这些数据插入数据库或 csv 文件,但它们超过 5000 行,这可能很难。所有数据都以一致的格式编写让我给你举个例子

11/11/2010 - 09:00am : He got nausea, vomiting and died 4 hours later

我应该得到以下数据

Sex: Male
Symptoms: Nausea
Vomiting
Death: True
Death Time: 11/11/2010 - 01:00pm

另一个例子

11/11/2010 - 09:00am : She got heart burn, vomiting of blood and died 1 hours later in the operation room

我明白了

Sex: Female
Symptoms: Heart burn
Vomiting of blood
Death: True
Death Time: 11/11/2010 - 10:00am

当我说 in 时顺序不一致......所以 in 是一个关键字,之后的所有文本都是一个地方,直到我找到另一个关键字
在开始时,他或她确定性别,得到 ........ 接下来是一组症状,我应该根据分隔符拆分,分隔符可以是逗号、连字符或其他任何东西,但它对于同一行是一致的< br/>died ..... hours later also should get how to get however, sometimes the patient still alive and discharge ....etc
这就是说我们有很多约定,我认为如果我可以用关键字和模式标记文本,我就可以完成工作。因此,如果您知道一个有用的函数/模块/教程/工具,最好在 python 中执行此操作(如果不是 python,那么 gui 工具会很好)

一些信息:

there are a lot of rules to express various medical data but here are few examples
- Start with the same date/time format followed by a space followd by a colon followed by a space followed by He/She followed space followed by rules separated by and
- Rules:
* got <symptoms>,<symptoms>,....
* investigations were done <investigation>,<investigation>,<investigation>,......
* received <drug or procedure>,<drug or procedure>,.....
* discharged <digit> (hour|hours) later
* kept under observation
* died <digit> (hour|hours) later
* died <digit> (hour|hours) later in <place>
other rules do exist but they follow the same idea

最佳答案

这使用 dateutil解析日期(例如“11/11/2010 - 09:00am”)和 parsedatetime解析相对时间(例如“4 小时后”):

import dateutil.parser as dparser
import parsedatetime.parsedatetime as pdt
import parsedatetime.parsedatetime_consts as pdc
import time
import datetime
import re
import pprint
pdt_parser = pdt.Calendar(pdc.Constants())
record_time_pat=re.compile(r'^(.+)\s+:')
sex_pat=re.compile(r'\b(he|she)\b',re.IGNORECASE)
death_time_pat=re.compile(r'died\s+(.+hours later).*$',re.IGNORECASE)
symptom_pat=re.compile(r'[,-]')

def parse_record(astr):
match=record_time_pat.match(astr)
if match:
record_time=dparser.parse(match.group(1))
astr,_=record_time_pat.subn('',astr,1)
else: sys.exit('Can not find record time')
match=sex_pat.search(astr)
if match:
sex=match.group(1)
sex='Female' if sex.lower().startswith('s') else 'Male'
astr,_=sex_pat.subn('',astr,1)
else: sys.exit('Can not find sex')
match=death_time_pat.search(astr)
if match:
death_time,date_type=pdt_parser.parse(match.group(1),record_time)
if date_type==2:
death_time=datetime.datetime.fromtimestamp(
time.mktime(death_time))
astr,_=death_time_pat.subn('',astr,1)
is_dead=True
else:
death_time=None
is_dead=False
astr=astr.replace('and','')
symptoms=[s.strip() for s in symptom_pat.split(astr)]
return {'Record Time': record_time,
'Sex': sex,
'Death Time':death_time,
'Symptoms': symptoms,
'Death':is_dead}


if __name__=='__main__':
tests=[('11/11/2010 - 09:00am : He got nausea, vomiting and died 4 hours later',
{'Sex':'Male',
'Symptoms':['got nausea', 'vomiting'],
'Death':True,
'Death Time':datetime.datetime(2010, 11, 11, 13, 0),
'Record Time':datetime.datetime(2010, 11, 11, 9, 0)}),
('11/11/2010 - 09:00am : She got heart burn, vomiting of blood and died 1 hours later in the operation room',
{'Sex':'Female',
'Symptoms':['got heart burn', 'vomiting of blood'],
'Death':True,
'Death Time':datetime.datetime(2010, 11, 11, 10, 0),
'Record Time':datetime.datetime(2010, 11, 11, 9, 0)})
]

for record,answer in tests:
result=parse_record(record)
pprint.pprint(result)
assert result==answer
print

产量:

{'Death': True,
'Death Time': datetime.datetime(2010, 11, 11, 13, 0),
'Record Time': datetime.datetime(2010, 11, 11, 9, 0),
'Sex': 'Male',
'Symptoms': ['got nausea', 'vomiting']}

{'Death': True,
'Death Time': datetime.datetime(2010, 11, 11, 10, 0),
'Record Time': datetime.datetime(2010, 11, 11, 9, 0),
'Sex': 'Female',
'Symptoms': ['got heart burn', 'vomiting of blood']}

注意:小心解析日期。 “8/9/2010”是指 8 月 9 日还是 9 月 8 日?所有的记录员都使用相同的约定吗?如果您选择使用 dateutil(如果日期字符串的结构不严格,我真的认为这是最好的选择)请务必阅读 dateutil documentation 中关于“格式优先级”的部分。所以你可以(希望)正确解决'8/9/2010'。如果您不能保证所有记录管理员都使用相同的约定来指定日期,那么将手动检查此脚本的结果。无论如何,这可能是明智的。

关于python - 使用Python提取医疗信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4011526/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com