gpt4 book ai didi

hadoop - 如何编写查询以避免在选择不同和大小的 collect_set 配置单元查询中使用单个 reducer?

转载 作者:可可西里 更新时间:2023-11-01 15:32:08 25 4
gpt4 key购买 nike

如何重写这些查询以避免在 reduce 阶段使用单个 reducer?它需要永远,我失去了使用它的并行性的好处。

select id
, count(distinct locations) AS unique_locations
from
mytable
;

select id
, size(collect_set(locations)) AS unique_locations
from
mytable
;

最佳答案

使用两个查询对 count(distinct var) 有效:

SELECT
count(1)
FROM (
SELECT DISTINCT locations as unique_locations
from my_table
) t;

我认为大小 collect_set 也是如此:

SELECT
size(unique_locations)
FROM (
SELECT collect_set(locations) as unique_locations
from my_table
) t;

关于hadoop - 如何编写查询以避免在选择不同和大小的 collect_set 配置单元查询中使用单个 reducer?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31217198/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com