gpt4 book ai didi

regex - 使用 serde 正则表达式在配置单元表中加载日志数据为空

转载 作者:行者123 更新时间:2023-12-04 16:32:51 26 4
gpt4 key购买 nike

我想解析这个日志样本

May 3 11:52:54 cdh-dn03 init: tty (/dev/tty6) main process (1208) killed by TERM signal

May 3 11:53:31 cdh-dn03 kernel: registered taskstats version 1

May 3 11:53:31 cdh-dn03 kernel: sr0: scsi3-mmc drive: 32x/32x xa/form2 tray

May 3 11:53:31 cdh-dn03 kernel: piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr

May 3 11:53:31 cdh-dn03 kernel: nf_conntrack version 0.5.0 (7972 buckets, 31888 max)

May 3 11:53:57 cdh-dn03 kernel: hrtimer: interrupt took 11250457 ns

May 3 11:53:59 cdh-dn03 ntpd_initres[1705]: host name not found: 0.rhel.pool.ntp.org

这就是我创建表并将数据加载到其中的方式

CREATE TABLE LogParserSample(

month_name STRING, day STRING, time STRING, host STRING, event STRING, log STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

WITH SERDEPROPERTIES (

'input.regex' = '(^(\S+))\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+((\S+.)*)')

stored as textfile;

我正在使用这些网站生成正则表达式

http://www.regexe.com/

http://rubular.com/

这两个是我正在使用的正则表达式

(\w{3})\s+(\w{1})\s+(\S+)\s+(\S+)\s+(\S+)\s+((\S+.)*)

(^(\S+))\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+((\S+.)*)

加载数据并选择

load data local inpath '/home/programmeur_v/serde_dataset.txt' into table LogParserSample;

select * from LogParserSample;

输出为空

hive> select * from LogParserSample;

OK

NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL

Time taken: 0.094 seconds, Fetched: 7 row(s)

刚接触 hive,所以不知道到底是什么问题

最佳答案

我们需要使用 Java 等效正则表达式,同时使用正则表达式 serde 创建 Hive 表

Try with below ddl:

hive> CREATE TABLE LogParserSample(
month_name STRING, day STRING, time STRING, host STRING, event STRING, log STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'input.regex' = '(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(.*)')
stored as textfile;

hive> select * from LogParserSample;
+-------------+------+-----------+-----------+----------------------+-----------------------------------------------------------------------------------------------------+--+
| month_name | day | time | host | event | log |
+-------------+------+-----------+-----------+----------------------+-----------------------------------------------------------------------------------------------------+--+
| May | 3 | 11:52:54 | cdh-dn03 | init: | tty (/dev/tty6) main process (1208) killed by TERM signal |
| May | 3 | 11:53:31 | cdh-dn03 | kernel: | registered taskstats version 1 |
| May | 3 | 11:53:31 | cdh-dn03 | kernel: | sr0: scsi3-mmc drive: 32x/32x xa/form2 tray |
| May | 3 | 11:53:31 | cdh-dn03 | kernel: | piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr |
| May | 3 | 11:53:31 | cdh-dn03 | kernel: | nf_conntrack version 0.5.0 (7972 buckets, 31888 max) |
| May | 3 | 11:53:57 | cdh-dn03 | kernel: | hrtimer: interrupt took 11250457 ns |
| May | 3 | 11:53:59 | cdh-dn03 | ntpd_initres[1705]: | host name not found: 0.rhel.pool.ntp.org |
+-------------+------+-----------+-----------+----------------------+-----------------------------------------------------------------------------------------------------+--+

使用this生成 Java 等效正则表达式的链接。

关于regex - 使用 serde 正则表达式在配置单元表中加载日志数据为空,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51584426/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com