gpt4 book ai didi

linux - 用bash解析Apache日志

转载 作者:行者123 更新时间:2023-12-03 09:55:43 25 4
gpt4 key购买 nike

我想解析一个Apache日志文件,例如:

1.1.1.1 - - [12/Dec/2019:18:25:11 +0100] "GET /endpoint1/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
1.1.1.1 - - [13/Dec/2019:18:25:11 +0100] "GET /endpoint1/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
2.2.2.2 - - [13/Dec/2019:18:27:11 +0100] "GET /endpoint1/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
2.2.2.2 - - [13/Jan/2020:17:15:13 +0100] "GET /endpoint2/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
3.3.3.3 - - [13/Jan/2020:17:15:13 +0100] "GET /endpoint2/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
1.1.1.1 - - [13/Feb/2020:17:15:13 +0100] "GET /endpoint2/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
4.4.4.4 - - [13/Feb/2020:17:15:13 +0100] "GET /endpoint2/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
4.4.4.4 - - [13/Feb/2020:17:15:13 +0100] "GET /endpoint2/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
4.4.4.4 - - [13/Feb/2020:17:15:13 +0100] "GET /endpoint2/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
我需要获取每月访问的客户端IP列表。我有这样的东西
awk '{print $1,$4}' access.log | grep Dec | cut -d" " -f1 | uniq -c
但这是错误的,因为它计算每天的访问IP。
预期结果应该像(缩进无关紧要):
Dec 2019
1.1.1.1 2
2.2.2.2 1
Jan 2020
2.2.2.2 1
3.3.3.3 1
Feb 2020
4.4.4.4 3
1.1.1.1 1
其中2是截至2019年12月1.1.1.1 ip以来的总访问量。
您能给我建议一种方法吗?

最佳答案

GNU awk的一种,按照输入数据的顺序输出(即,按日志顺序输出时间顺序数据,例如日志记录):

$ gawk '                     # using GNU awk
BEGIN {
a[""][""] # initialize a 2D array
}
{
split($4,t,/[/:]/) # split datetime
my=t[2] OFS t[3] # my=month year
if(!(my in mye)) { # if current my unseen
mye[my]=++myi # update month year exists array with new index
mya[myi]=my # chronology is made
}
a[mye[my]][$1]++ # update record to a hash
}
END { # in the end
# PROCINFO["sorted_in"]="@val_num_desc" # this may work for ordering visits
for(i=1;i<=myi;i++) { # in fed order
print mya[i] # print month year
for(j in a[i]) # then related ips in no particular order
print j,a[i][j] # output ip and count
}
}' file
输出:
Dec 2019
1.1.1.1 2
2.2.2.2 1
Jan 2020
2.2.2.2 1
3.3.3.3 1
Feb 2020
1.1.1.1 1
4.4.4.4 3

关于linux - 用bash解析Apache日志,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64336555/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com