gpt4 book ai didi

hadoop - 数组中的 Hive 列数据

转载 作者:可可西里 更新时间:2023-11-01 14:40:07 24 4
gpt4 key购买 nike

我有一个包含小时级别数据的表格。我想在一个数组中找到小时数和所有小时的值。输入表

+-----+-----+-----+
| hour| col1| col2|
+-----+-----+-----+
| 00 | 0.0 | a |
| 04 | 0.1 | b |
| 08 | 0.2 | c |
| 12 | 0.0 | d |
+-----+-----+-----+

按照下面的解决方案中的建议,我正在使用函数来获取数组中的列值

select count(hr), 
map_values(str_to_map(concat_ws(
',',
collect_set(
concat_ws(':', reflect('java.util.UUID','randomUUID'), cast(col1 as string))
)
))) as col1_arr,
map_values(str_to_map(concat_ws(
',',
collect_set(
concat_ws(':',reflect('java.util.UUID','randomUUID'), cast(col12 as string))
)
))) as col2_arr from table;

我得到的输出,col2_arr 中的值与 col1_arr 的顺序不同。请建议我如何以相同的顺序获取不同列的数组/列表中的值。

+----------+-----------------+----------+
| count(hr)| col1_arr | col2_arr |
+----------+-----------------+----------+
| 4 | 0.0,0.1,0.2,0.0 | b,a,c,d |
+----------+----------------+-----------+

要求的输出:

+----------+-----------------+----------+
| count(hr)| col1_arr | col2_arr |
+----------+-----------------+----------+
| 4 | 0.0,0.1,0.2,0.0 | a,b,c,d |
+----------+----------------+-----------+

最佳答案

with    t as 
(
select inline
(
array
(
struct('00',0.0)
,struct('04',0.1)
,struct('08',0.2)
,struct('12',0.0)
)
) as (hour,col1)
)

select count(*),collect_list(col1),max(col1)
from t
;

+-----+-------------------+-----+
| _c0 | _c1 | _c2 |
+-----+-------------------+-----+
| 4 | [0.0,0.1,0.2,0.0] | 0.2 |
+-----+-------------------+-----+

如果要保证数组内元素的顺序,使用-

sort_array(collect_list(col1)) 

如果你想消除数组中重复的元素,使用-

collect_set(col1)

在没有 collect_list 的情况下保留重复值

with    t as 
(
select inline
(
array
(
struct('00',0.0)
,struct('04',0.0)
,struct('08',0.1)
,struct('12',0.1)
)
) as (hour,col1)
)

select map_values(str_to_map(concat_ws(',',collect_set(concat_ws(':',reflect('java.util.UUID','randomUUID'),cast(col1 as string))))))
from t
;

["0.0","0.0","0.1","0.1"]

关于hadoop - 数组中的 Hive 列数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42634441/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com