gpt4 book ai didi

sql - Apache Hive 查询 HiveQL

转载 作者:行者123 更新时间:2023-12-02 21:45:42 24 4
gpt4 key购买 nike

我正在学习 Hive 并想编写优化的 HiveQL/SQL 查询

我的表如下所示:

CREATE TABLE sales (dealer VARCHAR(25), make VARCHAR(25), type VARCHAR(25), day INT);
INSERT INTO sales (dealer, make, type, day) VALUES
("Xyz", "Highlander", "SUV", "0"),
("Xyz", "Prius", "HATCH", "1"),
("Xyz", "Prius", "HATCH", "2"),
("Xyz", "Prius", "HATCH", "3"),
("Xyz", "Versa", "HATCH", "1"),
("Xyz", "Versa", "HATCH", "2"),
("Xyz", "Versa", "HATCH", "3"),
("Xyz", "S3", "SEDAN", "1"),
("Xyz", "S3", "SEDAN", "2"),
("Abc", "Forrester", "SUV", "1");

给定一个“经销商”D,我想在单个查询中计算过去 X 天内每种“类型”的前 N ​​个“品牌”。
SELECT dealer, make, type, COUNT(*) AS frequency FROM sales
WHERE day > 0 AND dealer LIKE 'Xyz' GROUP BY make, type
ORDER BY frequency DESC LIMIT 5

问题是在前 1 的“make”和“type”上使用 GROUP BY 时,我只会得到:
DEALER, MAKE, TYPE, COUNT
Xyz, Prius, Hatch, 3
Xyz, Versa, Hatch, 3
Xyz, S3, Sedan, 2
...

但我想要
Xyz, Prius, Hatch, 3
Xyz, S3, Sedan, 2
...

对于每个“类型”,前 N 个。

有人可以帮我理解如何编写这样的查询吗?

SQL fiddle
http://sqlfiddle.com/#!2/df9304/5

****更新****

似乎 rank() 会很有用:

Hive getting top n records in group by query

https://blogs.oracle.com/taylor22/entry/hive_0_11_may_15

HiveQL and rank()

最佳答案

在阅读了更多文档和链接问题的提示后:

SELECT dealer, make, rank, type FROM (
SELECT dealer, make, rank() OVER (PARTITION BY type ORDER BY count DESC) AS rank, type FROM (
SELECT dealer, make, count(*) AS count, type FROM Sales WHERE dealer = "Xyz" GROUP BY dealer, type, make
) CountedSales
) RankedSales
WHERE RankedSales.rank < 3;

内部查询执行计数,中间查询执行 rank(),外部查询限制排名。

销售表内容
hive> select * from Sales;
OK
Xyz Highlander SUV NULL
Xyz Highlander SUV NULL
Xyz Rouge SUV NULL
Xyz Rouge SUV NULL
Xyz Prius HATCH NULL
Xyz Prius HATCH NULL
Xyz Prius HATCH NULL
Xyz Versa HATCH NULL
Xyz S3 SEDAN NULL
Xyz S3 SEDAN NULL
Xyz S3 SEDAN NULL
Xyz A8 SEDAN NULL
Xyz A8 SEDAN NULL
Xyz A8 SEDAN NULL
Xyz A8 SEDAN NULL
Time taken: 0.054 seconds, Fetched: 15 row(s)

现在是实际查询。
hive> SELECT dealer, make, rank, type FROM (                                                                          
> SELECT dealer, make, rank() OVER (PARTITION BY type ORDER BY count DESC) AS rank, type FROM (
> SELECT dealer, make, count(*) AS count, type FROM Sales WHERE dealer = "Xyz" GROUP BY dealer, type, make
> ) CountedSales
> ) RankedSales
> WHERE RankedSales.rank < 3;
...
Execution completed successfully
MapredLocal task succeeded
OK
Xyz Prius 1 HATCH
Xyz Versa 2 HATCH
Xyz A8 1 SEDAN
Xyz S3 2 SEDAN
Xyz Rouge 1 SUV
Xyz Highlander 1 SUV
Time taken: 28.491 seconds, Fetched: 6 row(s)

关于sql - Apache Hive 查询 HiveQL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25465662/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com