gpt4 book ai didi

hadoop - 用于解析 aws elb 日志的 pig 脚本

转载 作者:可可西里 更新时间:2023-11-01 16:42:17 25 4
gpt4 key购买 nike

我正在尝试用 pig 解析这个 elb 日志,我能够使用这个脚本成功解析它

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2016-07-16T00:00:41.700161Z testelb 11.11.17.2:50883 192.168.1.94:80 0.00002 0.001392 0.000019 200 200 0 43 "GET http://test.example.com:80/bac?aid=b5cf542d74&cid=etrsewtp&bid=23c45c543&dte=Sat%20Jul%2016%202016%2008:00:41%20GMT+0800%20(HKT) HTTP/1.1"."Mozilla2Phone iPhone OS_9;03 OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Mobile/13F69"- -+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

***************************************************************
A = LOAD '/tmp/one.log' USING TextLoader AS (line:chararray);

B = FOREACH A GENERATE FLATTEN (
REGEX_EXTRACT_ALL(
line,'^(\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) "(.+?)" "(.+?)" (\\S+) (\\S+)')
) AS (
timestamp:chararray, elb:int, client_port:chararray, backend_port:chararray, request_processing_time:float, backend_processing_time:float, response_processing_time:float, elb_status_code:int, backend_status_code:int, received_bytes:int, sent_bytes:int, request:chararray, user_agent:chararray, ssl_cipher:chararray, ssl_protocol:chararray
);

DUMP B;

现在我想提取请求 url、援助、出价、cid 等,但无法匹配正则表达式。有人可以帮我获取这些详细信息吗?

除了上面的正则表达式方法之外,如果有任何其他方法可以获取完整的 elb 日志详细信息,那么我想知道。

注意:aid、bid、cid在请求日志中的位置不固定。

最佳答案

您的问题已经得到解答here

Alternate way to do the same task需要自定义加载器。

关于hadoop - 用于解析 aws elb 日志的 pig 脚本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39775633/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com