gpt4 book ai didi

python - 避免数据框中的 KeyError

转载 作者:行者123 更新时间:2023-12-01 08:46:02 25 4
gpt4 key购买 nike

我正在使用以下代码验证我的数据框,

df = df[(df[['name', 'issuer_id', 'service_area_id']].notnull().all(axis=1)) &
((df['plan_year'].notnull()) & (df['plan_year'].astype(str).str.isdigit()) & (df['plan_year'].astype(str).str.len() == 4)) &
(df[['network_url', 'formulary_url', 'sbc_download_url', 'treatment_cost_calculator_url']].astype(str).apply(lambda x: (x.str.contains('\A(https?:\/\/)([a-zA-Z0-9\-_])*(\.)*([a-zA-Z0-9\-]+)\.([a-zA-Z\.]{2,5})(\.*.*)?\Z')) | x.isin(['nan'])).all(axis=1)) &
(df[['promotional_label']].astype(str).apply(lambda x: (x.str.len <= 65) | x.isin(['nan'])).all(axis=1)) &
# (df[['sort_rank_override']].astype(str).apply(lambda x: (x.str.isdigit()) | x.isin(['nan'])).all(axis=1)) &
((df['hios_plan_identifier'].notnull()) & (df['hios_plan_identifier'].str.len() >= 10) & (df['hios_plan_identifier'].str.contains('\A(\d{5}[A-Z]{2}[a-zA-Z0-9]{3,7}-TMP|\d{5}[A-Z]{2}\d{3,7}(\-?\d{2})*)\Z'))) &
(df['type'].isin(['MetalPlan', 'MedicarePlan', 'BasicHealthPlan', 'DualPlan', 'MedicaidPlan', 'ChipPlan'])) &
(df['price_period'].isin(['Monthly', 'Yearly'])) &
(df['is_age_29_plan'].astype(str).isin(['True', 'False', 'nan']))]
# (df[['composite_rating']].astype(str).apply(lambda x: (x.str.isin(['True', 'False']) & x.isnotin(['nan'])).all(axis=1)))]

这会让我崩溃

KeyError: "['name'] not in index"

当我的数据框中不存在该列时。我需要处理所有列。如何有效地向上述代码添加检查,仅在列存在时检查验证?

最佳答案

您可以使用intersection :

L = ['name', 'issuer_id', 'service_area_id']
cols = df.columns.intersection(L)

(df[cols].notnull().all(axis=1))

编辑:

df = pd.DataFrame({
'name':list('abcdef'),
'plan_year':[2015,2015,2015,5,5,4],
})
print (df)
name plan_year
0 a 2015
1 b 2015
2 c 2015
3 d 5
4 e 5
5 f 4

想法是首先为每列创建有效值的字典:

valid = {'name':'a', 
'issuer_id':'a',
'service_area_id':'a',
'plan_year':2015,
...}

然后通过缺失列和 assign 过滤新字典原始 DataFrame 并创建新的 DataFrame:

d1 = {k: v for k, v in valid.items() if k in set(valid.keys()) - set(df.columns)}
print (d1)
{'issuer_id': 'a', 'service_area_id': 'a'}


df1 = df.assign(**d1)
print (df1)
name plan_year issuer_id service_area_id
0 a 2015 a a
1 b 2015 a a
2 c 2015 a a
3 d 5 a a
4 e 5 a a
5 f 4 a a

最后一个过滤器:

m1 = (df1[['name', 'issuer_id', 'service_area_id']].notnull().all(axis=1)) 
m2 = ((df1['plan_year'].notnull()) &
(df1['plan_year'].astype(str).str.isdigit()) &
(df1['plan_year'].astype(str).str.len() == 4))

df1 = df1[m1 & m2]
print (df1)
name plan_year issuer_id service_area_id
0 a 2015 a a
1 b 2015 a a
2 c 2015 a a

最后您可以删除辅助列:

df1 = df1[m1 & m2].drop(d1.keys(), axis=1)
print (df1)
name plan_year
0 a 2015
1 b 2015
2 c 2015

关于python - 避免数据框中的 KeyError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53299337/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com