gpt4 book ai didi

python - 使用python正则表达式解析文本文件中的相关行组

转载 作者:太空宇宙 更新时间:2023-11-03 14:10:19 26 4
gpt4 key购买 nike

我有一个包含以下文本的文件:

$ more audit.log2018-01-31 15:34:08 GMT:10.34.160.60(63788):agent3@pem:[31884]00000:LOG:  statement: DROP TABLE tmp_zombies2018-01-31 15:58:52 GMT:127.0.0.1(45050):agent1@pem:[13182]00000:LOG:  statement: CREATE TEMP TABLE tmp_zombies(jagpid int4)2018-01-31 15:58:52 GMT:127.0.0.1(45050):agent1@pem:[13182]00000:LOG:  statement: DROP TABLE tmp_zombies2018-01-31 16:24:00 GMT:10.34.160.55(57199):agent8@pem:[27888]00000:LOG:  statement: CREATE TEMP TABLE tmp_zombies(jagpid int4)2018-01-31 16:24:00 GMT:10.34.160.55(57199):agent8@pem:[27888]00000:LOG:  statement: DROP TABLE tmp_zombies2018-01-31 21:08:47 GMT:[local]:pgsql@p106:[26349]00000:LOG:  statement: create table global_pg_audit        (           rolename         text not null,           stmt_timestamp   timestamp not null,           source_ip        text,           target_ip        text,           dbname           text,           pid              text,           statement_type   text,           statement        text        );2018-01-31 15:34:08 GMT:10.34.160.60(63788):agent3@pem:[31884]00000:LOG:  statement: DROP TABLE tmp_zombies

当我运行这个 python 代码时:

    import re    fullpathname='./audit.log'    regex_pattern=re.compile(r'^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})(.*?)$',re.MULTILINE|re.DOTALL)    with open(fullpathname,'r') as f:        log_entries = regex_pattern.findall(f.read())    counter=0    for entry in log_entries:        print '%d=>['%(counter),entry,']'        counter=counter+1

输出如下:

0=>[ ('2018-01-31 15:34:08', ' GMT:10.34.160.60(63788):agent3@pem:[31884]00000:LOG:  statement: DROP TABLE tmp_zombies') ]1=>[ ('2018-01-31 15:58:52', ' GMT:127.0.0.1(45050):agent1@pem:[13182]00000:LOG:  statement: CREATE TEMP TABLE tmp_zombies(jagpid int4)') ]2=>[ ('2018-01-31 15:58:52', ' GMT:127.0.0.1(45050):agent1@pem:[13182]00000:LOG:  statement: DROP TABLE tmp_zombies') ]3=>[ ('2018-01-31 16:24:00', ' GMT:10.34.160.55(57199):agent8@pem:[27888]00000:LOG:  statement: CREATE TEMP TABLE tmp_zombies(jagpid int4)') ]4=>[ ('2018-01-31 16:24:00', ' GMT:10.34.160.55(57199):agent8@pem:[27888]00000:LOG:  statement: DROP TABLE tmp_zombies') ]5=>[ ('2018-01-31 21:08:47', ' GMT:[local]:pgsql@p106:[26349]00000:LOG:  statement: create table global_pg_audit ') ]6=>[ ('2018-01-31 15:34:08', ' GMT:10.34.160.60(63788):agent3@pem:[31884]00000:LOG:  statement: DROP TABLE tmp_zombies') ]7=>[ ('2018-01-31 15:58:52', ' GMT:127.0.0.1(45050):agent1@pem:[13182]00000:LOG:  statement: CREATE TEMP TABLE tmp_zombies(jagpid int4)') ]

请注意输出中的第 5 行,代码未包含完整的语句,该语句应为:

    create table global_pg_audit        (           rolename         text not null,           stmt_timestamp   timestamp not null,           source_ip        text,           target_ip        text,           dbname           text,           pid              text,           statement_type   text,           statement        text        );

代码有什么问题?

非常感谢!

最佳答案

您的正则表达式锚定到行尾:

^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})(.*?)$

由于您启用了多行模式,$ 在换行符处匹配。这就是为什么比赛在 global_pg_audit 之后结束。

<小时/>

您想要匹配直到下一个以日期开头的行。您可以使用前瞻来执行此操作:

^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})(.*?)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}|\Z)

交替|\Z允许正则表达式匹配最后一行,即使它后面没有日期。

另请参阅regex demo .

关于python - 使用python正则表达式解析文本文件中的相关行组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48553106/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com