gpt4 book ai didi

python - 在 pandas 数据框中添加错误日志消息行

转载 作者:行者123 更新时间:2023-12-01 01:26:19 25 4
gpt4 key购买 nike

根据这个答案, Avoiding KeyError in dataframe ,我能够进行验证。但我需要跟踪哪一行由于哪种验证条件而失败。

有没有办法可以添加新列并提供失败消息?

我的代码-

valid_dict = {'name': 'WI 80 INDEMNITY 18 OPTION 1 SILVER RX $10/45/90/25%',
'issuer_id': 484,
'service_area_id': 1,
'plan_year': 2018,
'network_url': np.nan,
'formulary_url': np.nan,
'sbc_download_url': np.nan,
'treatment_cost_calculator_url': np.nan,
'promotional_label': np.nan,
'hios_plan_identifier': '99806CAAUSJ-TMP',
'type': 'MetalPlan',
'price_period': 'Monthly',
'is_age_29_plan': False,
'sort_rank_override': np.nan,
'composite_rating': False,
}

data_obj = DataService()
hios_issuer_identifer_list = data_obj.get_hios_issuer_identifer(df)

d1 = {k: v for k, v in valid_dict.items() if k in set(valid_dict.keys()) - set(df.columns)}
df1 = df.assign(**d1)
cols_url = df.columns.intersection(['network_url', 'formulary_url', 'sbc_download_url', 'treatment_cost_calculator_url'])
m1 = (df1[['name', 'issuer_id', 'service_area_id']].notnull().all(axis=1))
m2 = (df1[['promotional_label']].astype(str).apply(lambda x: (x.str.len <= 65) | x.isin(['nan'])).all(axis=1))
m3 = (df1[cols_url].astype(str).apply(lambda x: (x.str.contains('\A(https?:\/\/)([a-zA-Z0-9\-_])*(\.)*([a-zA-Z0-9\-]+)\.([a-zA-Z\.]{2,5})(\.*.*)?\Z')) | x.isin(['nan'])).all(axis=1))
m4 = ((df1['plan_year'].notnull()) & (df['plan_year'].astype(str).str.isdigit()) & (df['plan_year'].astype(str).str.len() == 4))
m5 = ((df1['hios_plan_identifier'].notnull()) & (df['hios_plan_identifier'].str.len() >= 10) & (df['hios_plan_identifier'].str.contains('\A(\d{5}[A-Z]{2}[a-zA-Z0-9]{3,7}-TMP|\d{5}[A-Z]{2}\d{3,7}(\-?\d{2})*)\Z')))
m6 = (df1['type'].isin(['MetalPlan', 'MedicarePlan', 'BasicHealthPlan', 'DualPlan', 'MedicaidPlan', 'ChipPlan']))
m7 = (df1['price_period'].isin(['Monthly', 'Yearly']))
m8 = (df1['is_age_29_plan'].astype(str).isin(['True', 'False', 'nan']))
m9 = (df1[['sort_rank_override']].astype(str).apply(lambda x: (x.str.isdigit()) | x.isin(['nan'])).all(axis=1))
m10 = (df1['composite_rating'].astype(str).isin(['True', 'False']))
m11 = (df1['hios_plan_identifier'].astype(str).str[:5].isin(hios_issuer_identifer_list))

df1 = df1[m1 & m2 & m3 & m4 & m5 & m6 & m7 & m8 & m9 & m10 & m11].drop(d1.keys(), axis=1)

merged = df.merge(df1.drop_duplicates(), how='outer', indicator=True)
merged[merged['_merge'] == 'left_only'].to_csv('logs/invalid_plan_data.csv')

return df1

类似下面的内容-

 wellthie_issuer_identifier  issuer_name    ...     service_area_id     _error
0 UHC99806 Fake Humana ... 1 failed on plan_year

最佳答案

df1 = df1[m1 & m2 & m3 & m4 & m5 & m6 & m7 & m8 & m9 & m10 & m11].drop(d1.keys(), axis=1)您正在选择没有任何条件失败的行。很明显,您在这里不会得到您想要的东西,但这没关系,因为这是经过验证的部分,不应该有错误。

您可以通过在删除失败的行之前进行另一次选择来获取错误:

df_error = df1.copy()
df_error['error_message'] = ~m1
...

如果列有错误,您可以定义一些要在表中显示的错误文本:

df_error['failed_on_name'] = pd.where(m1, your_message_here)

如果您想将错误显示到日志中,您可以循环错误表并输出消息(考虑列中带有 bool 值的第一个版本):

for _, row in df_error.iterrows():
print (error_message(dict(row)))

因此您将能够使用如下函数处理行:

def error_message(row):
row_desc = []
error_msg = []
for k, v in row.items():
if isinstance(v, bool):
if v:
error_msg.append(k)
else:
row_desc.append(v)
return 'Row ' + ' '.join(row_desc) + ' failed with errors: ' + ' '.join(error_msg)

关于python - 在 pandas 数据框中添加错误日志消息行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53319037/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com