gpt4 book ai didi

python - 找出并记录 pandas 中失败的验证条件

转载 作者:行者123 更新时间:2023-12-01 01:25:24 25 4
gpt4 key购买 nike

我有一个数据框 df,

      plan_year                                    name metal_level_name
0 20118 Gold Heritage Plus 1500 - 02 Gold
1 2018 NaN Platinum
2 2018 Gold Heritage Plus 2000 - 01 Gold

我已对 plan_yearname 列进行了数据验证,如下所示,

m4 = ((df['plan_year'].notnull()) & (df['plan_year'].astype(str).str.isdigit()) & (df['plan_year'].astype(str).str.len() == 4))

m1 = (df1[['name']].notnull().all(axis=1))

我得到了下面的有效数据框,

df1 = df[m1 & m4]

我可以获得df1中不存在的行(无效的行)

merged = df.merge(df1.drop_duplicates(), how='outer', indicator=True)
merged[merged['_merge'] == 'left_only']

我想跟踪哪一行由于哪个验证而失败。

我想获得一个包含所有无效数据数据框的数据框,如下所示-

 plan_year                                    name metal_level_name    Failed message
0 20118 Gold Heritage Plus 1500 - 02 Gold Failed due to wrong plan_year
1 2018 NaN Platinum name column cannot be null

有人可以帮我解决这个问题吗?

最佳答案

您可以使用numpy.select通过 ~ 反转 bool 值掩码:

message1 = 'name column cannot be null'
message4 = 'Failed due to wrong plan_year'


df['Failed message'] = np.select([~m1, ~m4], [message1, message4], default='OK')
print (df)
plan_year name metal_level_name \
0 20118 Gold Heritage Plus 1500 - 02 Gold
1 2018 NaN Platinum
2 2018 Gold Heritage Plus 2000 - 01 Gold

Failed message
0 Failed due to wrong plan_year
1 name column cannot be null
2 OK
<小时/>
df1 = df[df['Failed message'] != 'OK']
print (df1)
plan_year name metal_level_name \
0 20118 Gold Heritage Plus 1500 - 02 Gold
1 2018 NaN Platinum

Failed message
0 Failed due to wrong plan_year
1 name column cannot be null

编辑:对于多个错误消息,通过concat创建新的DataFrame然后将其矩阵乘以列名称和分隔符 dot最后通过 rstrip 从右侧删除分隔符:

print (df)
plan_year name metal_level_name
0 20118 Gold Heritage Plus 1500 - 02 Gold
1 2018 NaN Platinum
2 2018 Gold Heritage Plus 2000 - 01 Gold
1 20148 NaN Platinum

message1 = 'name column cannot be null'
message4 = 'Failed due to wrong plan_year'

df1 = pd.concat([~m1, ~m4], axis=1, keys=[message1, message4])
print (df1)
name column cannot be null Failed due to wrong plan_year
0 False True
1 True False
2 False False
1 True True


df['Failed message'] = df1.dot(df1.columns + ', ').str.rstrip(', ')
print (df)

plan_year name metal_level_name \
0 20118 Gold Heritage Plus 1500 - 02 Gold
1 2018 NaN Platinum
2 2018 Gold Heritage Plus 2000 - 01 Gold
1 20148 NaN Platinum

Failed message
0 Failed due to wrong plan_year
1 name column cannot be null
2
1 name column cannot be null, Failed due to wron...
<小时/>
df1 = df[df['Failed message'] != '']
print (df1)
plan_year name metal_level_name \
0 20118 Gold Heritage Plus 1500 - 02 Gold
1 2018 NaN Platinum
1 20148 NaN Platinum

Failed message
0 Failed due to wrong plan_year
1 name column cannot be null
1 name column cannot be null, Failed due to wron...

关于python - 找出并记录 pandas 中失败的验证条件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53407035/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com