gpt4 book ai didi

Regex Group,捕获 IP 的问题

转载 作者:行者123 更新时间:2023-12-01 23:19:30 25 4
gpt4 key购买 nike

我发布了稍微修改过的日志。

我有一个正则表达式来匹配一个日志行中的 3 个不同的组,我匹配时间、IP 和 SMTP 服务器收到的消息。

我用下面的正则表达式试过了(\d{2}.\d{2}.\d{4}\d{2}:\d{2}:\d{2}).*(\d{1,3}.\d{ 1,3}.\d{1,3}.\d{1,3})..断开.?\s+(\d+) 消息[s]

问题仅在于 2. 将 IP 分组以向您展示问题在第一行中,ip 是 11.132.8.61 what regexr cathces is only 1.132.8.6所以他遗漏了一些数字。我想用\d{1,3} 他会匹配所有三个或两个数字,如果有一个以上,他也在第二个括号中,但不在第一个或最后一个。

[16A4:000C-0780] 01.12.2020 01:00:07   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-07F8] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:00:07 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:00:08 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:04:51 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:30:46 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-0780] 01.12.2020 01:30:46 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000E-0780] 01.12.2020 01:33:25 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received
[16A4:000C-07F8] 01.12.2020 01:33:25 SMTP Server: 11.132.8.61 disconnected. 1 message[s] received

[12CC:0015-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000B-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000F-0FF0] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000F-120C] 30.11.2020 05:10:05 SMTP Server: bsicip03.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:0015-118C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:0014-118C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000B-120C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received
[12CC:000A-120C] 30.11.2020 05:10:05 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received

The expected out-put would be 
match[1] = 01.12.2020 01:00:07
match[2] = 11.132.8.61
match[3] = 1

最佳答案

.* 更改为 .*?(或者,假设您可以预期在捕获组之间出现至少一个字符, .+?) 使子表达式非贪婪

那样,.* 不会从以下 \d{1,3} 子表达式匹配的内容中“窃取”最多两个前导数字。

举个简单的例子:

# !! BROKEN: greedy.
PS> if (' 123' -match '.*(\d{1,3})') { $Matches[1] }

3 # !! Only the LAST digit matched, because .* matched as much as it
# !! could while still matching \d{1,3}
# OK: non-greedy.
PS> if (' 123' -match '.*?(\d{1,3})') { $Matches[1] }

123 # OK - all 3 digits matched, because .*? matched as little as it
# could while still matching \d{1,3}

将它们放在一起(请注意,我正在使用 .+?,也在 disconnected 之前代替 ..):

'[16A4:000C-0780] 01.12.2020 01:00:07   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received',
'[12CC:0015-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received' |
ForEach-Object {
if ($_ -match '(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}:\d{2}).+?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).+?disconnected\.?\s+(\d+) message\[s\]') {
[pscustomobject] @{
Count = $Matches[3]
Timestamp = $Matches[1]
IP = $Matches[2]
}
}
}

以上结果:

Count Timestamp           IP
----- --------- --
1 01.12.2020 01:00:07 11.132.8.61
1 30.11.2020 05:08:59 12.99.81.53

注意:

  • 通常(在您的情况下可能没有必要),您可以通过使用词边界断言 \b 围绕子表达式(例如 .\d)来使正则表达式更健壮{1,3} 这样它们就不会在较长 的数字序列中意外匹配,或者您可以明确规定一个数字(\D) 之前和之后。

使用 -split 运算符的替代解决方案:

作为Lee Daley指出,您可以使用 -splitstring splitting operator将您的行拆分为字段,作为正则表达式的概念上更简单的替代方法:

'[16A4:000C-0780] 01.12.2020 01:00:07   SMTP Server: 11.132.8.61 disconnected. 1 message[s] received',
'[12CC:0015-118C] 30.11.2020 05:08:59 SMTP Server: bsicip01.dd.example.com (12.99.81.53) disconnected. 1 message[s] received' |
ForEach-Object {
$fields = -split $_
if ($fields[-4] -eq 'disconnected.') {
[pscustomobject] @{
Count = $fields[-3]
Timestamp = '{0} {1}' -f $fields[1], $fields[2]
IP = $fields[-5].Trim('()')
}
}
}

上面的结果与基于正则表达式的解决方案相同。

关于Regex Group,捕获 IP 的问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68286686/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com