gpt4 book ai didi

regex - perl正则表达式通过关键字查找Java StackTrace

转载 作者:行者123 更新时间:2023-12-01 05:37:35 26 4
gpt4 key购买 nike

我需要按关键字从日志文件中 grep 完整的堆栈跟踪。

这段代码工作正常,但在大文件上会变慢(比文件更慢)。
我认为改进正则表达式以查找关键字的最佳方法,但我无法完成。

#!/usr/bin/perl

use strict;
use warnings;

my $regexp;
my $stacktrace;
undef $/;

$regexp = shift;
$regexp = quotemeta($regexp);

while (<>) {
while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s
(?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s
(?<THREAD>.*?)\/
(?<CLASS>.*?)\s-\s
(?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) {
$stacktrace = $&;
if ( $+{MESSAGE} =~ /$regexp/ ) {
print "$stacktrace";
}
}
}

用法: ./grep_log4j.pl <pattern> <file>
示例: ./grep_log4j.pl Exception sample.log
我认为问题在 $stacktrace = $&;因为如果删除这个字符串并简单地打印所有匹配的行脚本工作得很快。
打印所有匹配项的脚本版本:
#!/usr/bin/perl

use strict;
use warnings;

undef $/;

while (<>) {
while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s
(?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s
(?<THREAD>.*?)\/
(?<CLASS>.*?)\s-\s
(?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) {
print_result();
}
}

sub print_result {
print "LEVEL: $+{LEVEL}\n";
print "TIMESTAMP: $+{TIMESTAMP}\n";
print "THREAD: $+{THREAD}\n";
print "CLASS: $+{CLASS}\n";
print "MESSAGE: $+{MESSAGE}\n";
}

用法: ./grep_log4j.pl <file>
示例: ./grep_log4j.pl sample.log
Lo4j 模式: %-1p %d %t/%c{1} - %m%n
日志文件示例:
I 111012 141506.000 thread/class - Received message: something
E 111012 141606.000 thread/class - Failed handling mobile request
java.lang.NullPointerException
at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
at java.lang.Thread.run(Thread.java:619)
W 111012 141706.000 thread/class - Received message: something
E 111012 141806.000 thread/class - Failed with Exception
java.lang.NullPointerException
at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
at java.lang.Thread.run(Thread.java:619)
D 111012 141906.000 thread/class - Received message: something
S 111012 142006.000 thread/class - Received message: something
I 111012 142106.000 thread/class - Received message: something
I 111013 142206.000 thread/class - Metrics:0/1

我的正则表达式可以在 http://gskinner.com/RegExr/ 上找到通过 log4j 关键字:

最佳答案

您正在使用:

$/ = undef;

这使得 perl 将整个文件读入内存。

我会像这样逐行处理这个文件(假设堆栈跟踪与跟踪上方的消息相关联):
my $matched;
while (<>) {
if (m/^(?<LEVEL>\S+) \s+ (?<TIMESTAMP>(\d+) \s+ ([\d.])+) \s+ (?<THREADCLASS>\S+) \s+ - \s+ (?<REST>.*)/x) {
my %captures = %+;
$matched = ($+{REST} =~ $regexp);
if ($matched) {
print "LEVEL: $captures{LEVEL}\n";
...
}
} elsif ($matched) {
print;
}
}

这是解析多行块的通用技术。
以下循环读取 STDIN一次一行,将日志文件的完整块提供给子程序 process :
my $first;
my $stack = "";
while (<STDIN>) {
if (m/^\S /) {
process($first, $stack) if $first;
$first = $_;
$stack = "";
} else {
$stack .= $_;
}
}
process($first, $stack) if $first;

sub process {
my ($first, $stack) = @_;
# ... do whatever you want here ...
}

关于regex - perl正则表达式通过关键字查找Java StackTrace,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7810805/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com