gpt4 book ai didi

regex - 使用正则表达式进行日志解析

转载 作者:行者123 更新时间:2023-12-02 04:33:50 25 4
gpt4 key购买 nike

我正在尝试使用 Python 通过正则表达式解析 Apache 日志,并将其分配给单独的变量。

ACCESS_LOG_PATTERN = '^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+)\s*(\S+)\s*" (\d{3}) (\S+)'

logLine='127.0.0.1 - - [01/Jul/1995:00:00:01 -0400] "GET /images/launch-logo.gif HTTP/1.0" 200 1839'

我将解析它并将其分组到以下变量中:

match = re.search(APACHE_ACCESS_LOG_PATTERN, logLine)



host = match.group(1)

client_identd = match.group(2)

user_id = match.group(3)

date_time = match.group(4)

method = match.group(5)

endpoint = match.group(6)

protocol = match.group(7)

response_code = int(match.group(8))

content_size = match.group(9)

正则表达式模式对于日志行工作正常,但解析/正则表达式匹配在以下情况下失败:

'127.0.0.1 - - [01/Jul/1995:00:00:01 -0400] "GET /" 200 1839'

'127.0.0.1 - - [01/Jul/1995:00:00:01 -0400] "GET / " 200 1839'

如何解决这个问题?

最佳答案

您需要通过添加 ? 使您的 group 7 成为可选。使用以下正则表达式:

"^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] (\S+) (\S+)\s*(\S+)?\s* (\d{3}) (\S+)"

请参阅DEMO

输出:

[
[
{
"content": "127.0.0.1 - - [01/Jul/1995:00:00:01 -0400] \"GET /images/launch-logo.gif HTTP/1.0\" 200 1839",
"isParticipating": true,
"groupNum": 0,
"startPos": 0,
"endPos": 90
},
{
"content": "127.0.0.1",
"isParticipating": true,
"groupNum": 1,
"startPos": 0,
"endPos": 9
},
{
"content": "-",
"isParticipating": true,
"groupNum": 2,
"startPos": 10,
"endPos": 11
},
{
"content": "-",
"isParticipating": true,
"groupNum": 3,
"startPos": 12,
"endPos": 13
},
{
"content": "01/Jul/1995:00:00:01 -0400",
"isParticipating": true,
"groupNum": 4,
"startPos": 15,
"endPos": 41
},
{
"content": "\"GET",
"isParticipating": true,
"groupNum": 5,
"startPos": 43,
"endPos": 47
},
{
"content": "/images/launch-logo.gif",
"isParticipating": true,
"groupNum": 6,
"startPos": 48,
"endPos": 71
},
{
"content": "HTTP/1.0\"",
"isParticipating": true,
"groupNum": 7,
"startPos": 72,
"endPos": 81
},
{
"content": "200",
"isParticipating": true,
"groupNum": 8,
"startPos": 82,
"endPos": 85
},
{
"content": "1839",
"isParticipating": true,
"groupNum": 9,
"startPos": 86,
"endPos": 90
}
],
[
{
"content": "127.0.0.1 - - [01/Jul/1995:00:00:01 -0400] \"GET /\" 200 1839",
"isParticipating": true,
"groupNum": 0,
"startPos": 91,
"endPos": 150
},
{
"content": "127.0.0.1",
"isParticipating": true,
"groupNum": 1,
"startPos": 91,
"endPos": 100
},
{
"content": "-",
"isParticipating": true,
"groupNum": 2,
"startPos": 101,
"endPos": 102
},
{
"content": "-",
"isParticipating": true,
"groupNum": 3,
"startPos": 103,
"endPos": 104
},
{
"content": "01/Jul/1995:00:00:01 -0400",
"isParticipating": true,
"groupNum": 4,
"startPos": 106,
"endPos": 132
},
{
"content": "\"GET",
"isParticipating": true,
"groupNum": 5,
"startPos": 134,
"endPos": 138
},
{
"content": "/\"",
"isParticipating": true,
"groupNum": 6,
"startPos": 139,
"endPos": 141
},
{
"content": "",
"isParticipating": false,
"groupNum": 7,
"startPos": -1,
"endPos": -1
},
{
"content": "200",
"isParticipating": true,
"groupNum": 8,
"startPos": 142,
"endPos": 145
},
{
"content": "1839",
"isParticipating": true,
"groupNum": 9,
"startPos": 146,
"endPos": 150
}
],
[
{
"content": "127.0.0.1 - - [01/Jul/1995:00:00:01 -0400] \"GET / \" 200 1839",
"isParticipating": true,
"groupNum": 0,
"startPos": 152,
"endPos": 212
},
{
"content": "127.0.0.1",
"isParticipating": true,
"groupNum": 1,
"startPos": 152,
"endPos": 161
},
{
"content": "-",
"isParticipating": true,
"groupNum": 2,
"startPos": 162,
"endPos": 163
},
{
"content": "-",
"isParticipating": true,
"groupNum": 3,
"startPos": 164,
"endPos": 165
},
{
"content": "01/Jul/1995:00:00:01 -0400",
"isParticipating": true,
"groupNum": 4,
"startPos": 167,
"endPos": 193
},
{
"content": "\"GET",
"isParticipating": true,
"groupNum": 5,
"startPos": 195,
"endPos": 199
},
{
"content": "/",
"isParticipating": true,
"groupNum": 6,
"startPos": 200,
"endPos": 201
},
{
"content": "\"",
"isParticipating": true,
"groupNum": 7,
"startPos": 202,
"endPos": 203
},
{
"content": "200",
"isParticipating": true,
"groupNum": 8,
"startPos": 204,
"endPos": 207
},
{
"content": "1839",
"isParticipating": true,
"groupNum": 9,
"startPos": 208,
"endPos": 212
}
]
]

关于regex - 使用正则表达式进行日志解析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30956820/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com