gpt4 book ai didi

python - 如何读取类似于json格式的文本文件

转载 作者:太空宇宙 更新时间:2023-11-03 20:04:42 25 4
gpt4 key购买 nike

我尝试使用 pandas 操作 .txt 文件,但收到此错误:

 pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12

我尝试阅读 pandas 文档,但一无所获。

我的代码如下:

    import pandas as pd 
df = pd.read_csv('McKData_2511.txt')

我的文本文件如下所示:

        ,{'McKesson':''ManufacturerNo':'42023015925','Brand':'Generic Equivalent to Adrenalin®','Manufacturer':'Par Sterile Products LLC','CountryofOrigin':'United States','AlternatePackaging':'CT/25','Application':'Alpha- and Beta-Adrenergic Agonist','ContainerType':'Single Use Vial','DosageForm':'Injection','GenericDrugCode':'26184','GenericDrugName':'Epinephrine','NDCNumber':'42023-0159-25','StorageRequirements':'USP Controlled Room Temperature','Strength':'1 mg / mL (1:1000)','UNSPSCCode':'51391743','Volume':'1 mL'}
,{'McKesson':''ManufacturerNo':'00573016040','Manufacturer':'Pfizer','CountryofOrigin':'Unknown','ActiveIngredients':'Ibuprofen','Application':'Pain Relief','ContainerType':'Bottle','DosageForm':'Tablet','GenericDrugCode':'35743','NDCNumber':'00573-0160-40','Strength':'200 mg Strength','UNSPSCCode':'51384509','Volume':'100 per Bottle'}
,{'McKesson':''ManufacturerNo':'33332041910','Brand':'Afluria® Quadrivalent 2019 - 2020','Manufacturer':'Seqirus USA Inc','CountryofOrigin':'Australia','Application':'Flu Vaccine','ContainerType':'Multiple Dose Vial','DosageForm':'Injection','NDCNumber':'33332041910','StorageRequirements':'Requires Refrigeration','Strength':'60 mcg / 0.5 mL','Type':'Intramuscular','UNSPSCCode':'51201608','User':'Indicated For People 6 Months of Age and Above','Volume':'5 mL'}
,{'McKesson':''ManufacturerNo':'04110081127','Brand':'Afrin® Allergy Sinus','Manufacturer':'Bayer','CountryofOrigin':'Unknown','ActiveIngredients':'Oxymetazoline HCl','Application':'Sinus Relief','ContainerType':'Bottle','DosageForm':'Nasal Spray','Strength':'0.05% Strength','UNSPSCCode':'51162732','Volume':'15 mL'}
,{'McKesson':''ManufacturerNo':'04110081125','Brand':'Afrin® Original','Manufacturer':'Bayer','CountryofOrigin':'Unknown','ActiveIngredients':'Oxymetazoline HCl','Application':'Sinus Relief','ContainerType':'Bottle','DosageForm':'Nasal Spray','Strength':'0.05% Strength','UNSPSCCode':'51162732','Volume':'30 mL'}
,{'McKesson':''ManufacturerNo':'17478025310','Brand':'AK-Fluor®','Manufacturer':'Akorn','CountryofOrigin':'United States','Application':'Ophthalmic Disclosing Agent','ContainerType':'Single Dose Vial','DosageForm':'Injection','GenericDrugCode':'27760','GenericDrugName':'Fluorescein Sodium','NDCNumber':'17478025310','Strength':'10%, 500 mg / 5 mL','Type':'Intravenous','UNSPSCCode':'51441603','Volume':'5 mL'}
Error 487990425
,{'McKesson':''ManufacturerNo':'00487950103','Manufacturer':'Nephron Pharmaceutical','CountryofOrigin':'Unknown','AlternateManufacturerNumber':'1978717','Application':'Beta-Adrenergic Agonist','ContainerType':'Nebulizer Vial','DosageForm':'Solution','GenericDrugCode':'41681','GenericDrugName':'Albuterol Sulfate, Preservative Free','HCPCS':'J7609','NDCNumber':'00487-9501-03','Strength':'0.083%, 2.5 mg / 3 mL','Type':'Unit Dose, Inhalation','UNSPSCCode':'51391703','Volume':'30 Vials'}
,{'McKesson':''ManufacturerNo':'00591379760','CountryofOrigin':'Unknown','AlternateManufacturerNumber':'1151067','Application':'Beta-Adrenergic Agonist','ContainerType':'Nebulizer Vial','DosageForm':'Solution','GenericDrugCode':'41681','GenericDrugName':'Albuterol Sulfate, Preservative Free','NDCNumber':'00591-3797-60','Strength':'0.083%, 2.5 mg / 3 mL','Type':'Unit Dose, Inhalation','UNSPSCCode':'51391703','Volume':'60 Vials'}
Error 4879908743
,{'McKesson':''ManufacturerNo':'01093974344','Manufacturer':'McKesson Brand','CountryofOrigin':'Unknown','ActiveIngredients':'Ethyl Alcohol','Application':'Antiseptic','ContainerType':'Bottle','DosageForm':'Topical Solution','Strength':'70% Strength','UNSPSCCode':'42295421','Volume':'16 oz.'}
,{'McKesson':''ManufacturerNo':'70677000601','Brand':'sunmark®','Manufacturer':'McKesson Brand','CountryofOrigin':'Unknown','ActiveIngredients':'Cetirizine HCl','Application':'Allergy Relief','ContainerType':'Box','DosageForm':'Tablet','NDCNumber':'70677-0006-01','Strength':'10 mg Strength','UNSPSCCode':'51313101','Volume':'30 per Box'}

我怎样才能用这些数据将其读入 pandas 数据帧,因为在某些行中缺少许多值,该值应该包含 null 就像品牌位于第一行,但不在第二行,因此第二行应该为 null

因为我只想要这些值制造商编号UNSPSC代码品牌制造商国家数据中心我该如何解决这个问题?

最佳答案

这是我的原始方法,也许对你有帮助使用此代码,您将收到包含数据的列ManuFACTURERnO 的示例:

df=pd.read_csv('data',sep='{|}',engine='python',header=None)
df=df[1].str.split(',',expand=True)

df_all = pd.concat([df[0],df[1],df[2],df[3],df[4],df[5],df[6],df[7],
df[8],df[9],df[10],df[11],df[12],df[13],df[14],
df[15],df[16],df[17]]) #concat all columns, if number will be different, edit this line
df_all = pd.DataFrame(df_all)
df_all['ManufacturerNo'] = df_all[0].str.extract(r"(ManufacturerNo':'[0-9]{1,30})",expand=True)
df_all['ManufacturerNo'] = df_all['ManufacturerNo'].replace("ManufacturerNo':'","",regex=True)
print(df_all)

0 ManufacturerNo
0 'McKesson':''ManufacturerNo':'42023015925' 42023015925
1 'McKesson':''ManufacturerNo':'00573016040' 00573016040
2 'McKesson':''ManufacturerNo':'33332041910' 33332041910
3 'McKesson':''ManufacturerNo':'04110081127' 04110081127
4 'McKesson':''ManufacturerNo':'04110081125' 04110081125
5 'McKesson':''ManufacturerNo':'17478025310' 17478025310
6 None NaN
7 'McKesson':''ManufacturerNo':'00487950103' 00487950103
8 'McKesson':''ManufacturerNo':'00591379760' 00591379760
9 None NaN
10 'McKesson':''ManufacturerNo':'01093974344' 01093974344
11 'McKesson':''ManufacturerNo':'70677000601' 70677000601
0 'Brand':'Generic Equivalent to Adrenalin®' NaN
1 'Manufacturer':'Pfizer' NaN
2 'Brand':'Afluria® Quadrivalent 2019 - 2020' NaN
3 'Brand':'Afrin® Allergy Sinus' NaN
4 'Brand':'Afrin® Original' NaN
5 'Brand':'AK-Fluor®' NaN
6 None NaN
7 'Manufacturer':'Nephron Pharmaceutical' NaN

ManufacturerNo=df_all['ManufacturerNo'].dropna().tolist()
print(ManufacturerNo)
['42023015925', '00573016040', '33332041910', '04110081127', '04110081125', '17478025310', '00487950103', '00591379760', '01093974344', '70677000601']

如果您想为其他值创建列,例如 UNSPSCCode ,只需使用相同的正则表达式创建新列,或者根据需要创建新列,并替换其中的名称,例如:df_all['UNSPSCCode'].replace("UNSPSCCode':'","",regex=True)

关于python - 如何读取类似于json格式的文本文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59030658/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com