gpt4 book ai didi

regex - 在Pig中使用正则表达式解析日志文件

转载 作者:行者123 更新时间:2023-12-02 21:39:48 26 4
gpt4 key购买 nike

我需要将以下日志文​​件解析为脚本中从时间戳150324-21:06:32:937378的开始到下一个时间戳的开始之间的一条记录。我尝试使用图书馆

org.apache.pig.piggybank.storage.MyRegExLoader

以自定义格式加载记录。

150324-21:06:32:937378 [mod=STB, lvl=INFO ]
top - 21:06:33 up 3:41, 0 users, load average: 0.75, 0.95, 0.72
Tasks: 120 total, 3 running, 117 sleeping, 0 stopped, 0 zombie
Cpu(s): 21.8%us, 12.9%sy, 2.9%ni, 60.7%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st
Mem: 317108k total, 232588k used, 84520k free, 25960k buffers
Swap: 0k total, 0k used, 0k free, 110820k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19122 root 20 0 456m 72m 37m R 72 23.5 85:50.22 Receiver
5859 root 20 0 349m 9128 6948 S 15 2.9 22:42.88 rmfStreamer
150324-21:06:32:937378 [mod=STB, lvl=INFO ]
top - 21:06:33 up 3:41, 0 users, load average: 0.75, 0.95, 0.72
Tasks: 120 total, 3 running, 117 sleeping, 0 stopped, 0 zombie
Cpu(s): 21.8%us, 12.9%sy, 2.9%ni, 60.7%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st
Mem: 317108k total, 232588k used, 84520k free, 25960k buffers
Swap: 0k total, 0k used, 0k free, 110820k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19122 root 20 0 456m 72m 37m R 72 23.5 85:50.22 Receiver
5859 root 20 0 349m 9128 6948 S 15 2.9 22:42.88 rmfStreamer

这是我使用过的相关代码段

raw_logs = LOAD './main*/*top_log*'   USING org.apache.pig.piggybank.storage.MyRegExLoader('(?m)(?s)\\d*-\\d{2}:\\d{2}:\\d{2}\\:\\d*.*') AS line:chararray ; DUMP raw_logs;

这是我的输出:

(150325-05:47:26:253050 [mod=STB, lvl=INFO ])
(150325-05:57:27:294069 [mod=STB, lvl=INFO ])
(150325-06:07:28:235302 [mod=STB, lvl=INFO ])
(150325-06:17:29:124282 [mod=STB, lvl=INFO ])
(150325-06:27:30:036264 [mod=STB, lvl=INFO ])
(150325-06:37:30:941804 [mod=STB, lvl=INFO ])
(150325-06:47:31:909712 [mod=STB, lvl=INFO ])

它应该像两个元组

(150324-21:06:32:937378 [mod=STB, lvl=INFO ]
top - 21:06:33 up 3:41, 0 users, load average: 0.75, 0.95, 0.72
Tasks: 120 total, 3 running, 117 sleeping, 0 stopped, 0 zombie
Cpu(s): 21.8%us, 12.9%sy, 2.9%ni, 60.7%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st
Mem: 317108k total, 232588k used, 84520k free, 25960k buffers
Swap: 0k total, 0k used, 0k free, 110820k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19122 root 20 0 456m 72m 37m R 72 23.5 85:50.22 Receiver
5859 root 20 0 349m 9128 6948 S 15 2.9 22:42.88 rmfStreamer)
(150324-21:06:32:937378 [mod=STB, lvl=INFO ]
top - 21:06:33 up 3:41, 0 users, load average: 0.75, 0.95, 0.72
Tasks: 120 total, 3 running, 117 sleeping, 0 stopped, 0 zombie
Cpu(s): 21.8%us, 12.9%sy, 2.9%ni, 60.7%id, 0.0%wa, 0.0%hi, 1.7%si, 0.0%st
Mem: 317108k total, 232588k used, 84520k free, 25960k buffers
Swap: 0k total, 0k used, 0k free, 110820k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19122 root 20 0 456m 72m 37m R 72 23.5 85:50.22 Receiver
5859 root 20 0 349m 9128 6948 S 15 2.9 22:42.88 rmfStreamer)

请让我知道我可以使用的regex表达式,以便我的脚本考虑时间戳的开始直到下一个时间戳一个记录的开始。

最佳答案

尝试以下正则表达式的匹配组:

([0-9]{6}-[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]+ \[mod=[\s\S]*)[0-9]{6}-[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]+ \[mod=

关于regex - 在Pig中使用正则表达式解析日志文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29589241/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com