gpt4 book ai didi

python - 使用 pandas 正则表达式验证数据帧 header

转载 作者:行者123 更新时间:2023-12-01 06:46:17 26 4
gpt4 key购买 nike

在 pandas 数据框中有电子邮件标题。为此,我想验证一些可用的域名和特殊字符,我必须将其作为无效工作表保存到新工作表中。

这是示例数据框:

        Name    Email               Contact No
0 siddarth siddarth@gmail.com 4382430
1 supreet siupreet@outlook.com 21356908
2 sreeja sreeja@gmail.com 78940989
3 bsreddy bsreddy@yahoo.com 43687065
4 rakshita rakshita/hotmail.com 43685707
5 rahul rahul\live.com 54783929
6 cahal chahal?msdn.com 324567889
7. karthik karthik:gmail.com 4356589
8 rk rk;dell.com 65784930

对于上面的数据框,我想找到无效的电子邮件和特定的域名。

my code snippet is :

import re
import pandas as pd

demo_path = 'C:\\Users\\kiran\\Desktop\\mail_id.xlsx'
demo_read = pd.read_excel(demo_path)
pattern = pattern = re.compile(r'^@\w+\.[a-z]{0,3}$')
demo_read['Isemail'] = demo_read['Email'].apply(lambda x: True if pattern.match(x) else False)

上面的方法不起作用。为此,我使用了以下代码片段:

     a=demo_read.loc[demo_read['Email'].str.contains('@gmail.com')]
b=demo_read.loc[demo_read['Email'].str.contains('@outlook.com')]
c=demo_read.loc[demo_read['Email'].str.contains('?')]
d=demo_read.loc[demo_read['Email'].str.contains('/')]
e=demo_read.loc[demo_read['Email'].str.contains(r'\\')]
d=demo_read.loc[demo_read['Email'].str.contains(r'\?')]
f=demo_read.loc[demo_read['Email'].str.contains(':')]
g=demo_read.loc[demo_read['Email'].str.contains(';')]

还有一个疑问是,我们是否可以使用正则表达式验证并显示上述代码片段域和特殊字符的无效记录。请建议最好的方法。

  output should be like this:

Name Email Contact No
0 siddarth siddarth@gmail.com 4382430
1 supreet siupreet@outlook.com 21356908
2 sreeja sreeja@gmail.com 78940989
4 rakshita rakshita/hotmail.com 43685707
5 rahul rahul\live.com 54783929
6 cahal chahal?msdn.com 324567889
7 karthik karthik:gmail.com 4356589
8 rk rk;dell.com 65784930

最佳答案

尝试这个方法来解决最后一个不起作用的问题:

 demo_read.loc[demo_read['Email'].str.contains(r'\\')] 

你可以试试这个:

demo_read['Email'].str.contains(r"""(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])""")

demo_read['Email'][demo_read['Email'].str.contains(r"""(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])""")]

demo_read['Email'][~demo_read['Email'].str.contains(r"""(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])""")]

对于问号一,请尝试以下操作:

demo_read.loc[demo_read['Email'].str.contains(r"\?")]

这与 gmail 或 Outlook.com 不匹配:

demo_read[demo_read['Email'].str.contains(r"(?:(?!.*gmail.com)(?!.*outlook.com)[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])")]

输出:

      Name              Email  Contact No
3 bsreddy bsreddy@yahoo.com 43687065

对于您要寻找的输出,请执行相反的操作:

demo_read[~demo_read['Email'].str.contains(r"(?:(?!.*gmail.com)(?!.*outlook.com)[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])")]

输出:

       Name                 Email  Contact No
0 siddarth siddarth@gmail.com 4382430
1 supree siupreet@outlook.com 21356908
2 sreeja sreeja@gmail.com 78940989
4 rakshita rakshita/hotmail.com 43685707
5 rahul rahul\live.com 54783929
6 cahal chahal?msdn.com 324567889
7 karthik karthik:gmail.com 4356589
8 rk rk;dell.com 65784930

关于python - 使用 pandas 正则表达式验证数据帧 header ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59205552/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com