gpt4 book ai didi

hadoop - 查找排名前5位的地区,并在区域内按价格排名前5位的客户。 ( hive )

转载 作者:行者123 更新时间:2023-12-02 21:47:32 24 4
gpt4 key购买 nike

我们有一个要求,我们要按价格总和找到前N个地区,然后为每个地区找到前N个客户。

样本数据。

REGION_NAME,CUSTOMER_NAME,PRICE

RG1,Customer1,100
RG1,Customer2,200
RG1,Customer3,100
RG2,Customer4,100
RG2,Customer5,200
RG2,Customer6,400
RG3,Customer7,100
RG3,Customer8,200
RG3,Customer9,500
RG3,Customer9,200

假设我们希望通过合计价格来获得每个区域中排名前2位的区域和前2个客户

区域名称,区域总和,客户名称,客户价格(总和)
 RG3,1000,Customer9,700 (Sum of customer price)
RG3,1000,Customer8,200
RG2,700,Customer6,400
RG2,700,customer5,200

如何为此编写HIVE查询?我们无法思考如何使用HIVE编写此内容。我们可能必须编写MapReduce或PIG吗?

最佳答案

您可以使用分析功能和自联接在Hive中执行此操作:

select regions_ranked.region_name, regions_ranked.region_sum, customers_ranked.customer_name, customers_ranked.customer_sum from
(
select region_name, customer_name, customer_sum, rank() over (partition by region_name order by customer_sum desc) as customer_rank from (
select region_name, customer_name, sum(price) as customer_sum
from foo group by region_name, customer_name
) customers_sum
) customers_ranked
join
(
select region_name, region_sum, rank() over (order by region_sum desc) as region_rank from (
select region_name, sum(price) as region_sum
from foo group by region_name
) regions_sum
) regions_ranked
on customers_ranked.region_name = regions_ranked.region_name
where region_rank <= 2 and customer_rank <= 2;

尽管顺序不正确,但这会提供您一直在寻找的确切输出。如果需要,可以在最后添加“order by”子句。

关于hadoop - 查找排名前5位的地区,并在区域内按价格排名前5位的客户。 ( hive ),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23881926/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com