gpt4 book ai didi

sql - 优雅的 HiveQL 查询

转载 作者:行者123 更新时间:2023-12-02 21:51:33 24 4
gpt4 key购买 nike

我有一个像这样行的文件:

232404812.913232|1248|ip:tcp:jxta
232404812.913238|66|ip:udp:data
232404812.913615|98|ip:udp:l2tp:ppp:ip:tcp

我执行了以下 HiveQL 命令:
CREATE EXTERNAL TABLE b_packet (timestamp string, packet_length int, protocol string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
LOCATION 's3://b-file/input/';

CREATE EXTERNAL TABLE b_packet_out (protocol string, cnt int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
LOCATION 's3://b-file/output/1/';

INSERT OVERWRITE TABLE b_packet_out SELECT 'overall',
COUNT(*) FROM b_packet GROUP BY protocol;

INSERT INTO TABLE b_packet_out SELECT 'tcp',
COUNT(*) FROM b_packet WHERE protocol REGEXP '^ip:tcp';

INSERT INTO TABLE b_packet_out SELECT 'udp',
COUNT(*) FROM b_packet WHERE protocol REGEXP '^ip:udp';

INSERT INTO TABLE b_packet_out SELECT 'icmp',
COUNT(*) FROM b_packet WHERE protocol REGEXP '^ip:icmp';

这样我在输出表中有以下内容。
hive> select * from b_packet_out;
OK
udp 2241
overall 10000
icmp 64
tcp 7633

HiveQL 查询是否有更优雅的方式,以便我可以减少行数以获得相同的输出?

最佳答案

select 
count(*) as overall,
sum( if(protocol like '^ip:tcp',1,0) as tcp,
sum( if(protocol like '^ip:udp',1.0) as udp,
sum( if(protocol like '^ip:icmp'1,0) as icmp
from b_packet

这会在一次数据传递中生成相同的计数。

如果你有更多的协议(protocol),你也可以说
选择
拆分(协议(protocol),':')[1],
数数(*)
按拆分分组(协议(protocol),':')[1]
但这不会给出总体计数。

关于sql - 优雅的 HiveQL 查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20317756/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com