gpt4 book ai didi

hive - 如何将一个分区的数据插入/复制到配置单元中的多个分区?

转载 作者:行者123 更新时间:2023-12-02 00:53:29 26 4
gpt4 key购买 nike

我的配置单元表中有 day='2019-01-01' 的数据,我想将相同的数据复制到整个 Jan-2019 月。 (即在 '2019-01-02''2019-01-03'...'2019-01-31')

我正在尝试关注,但数据仅插入到“2019-01-02”中,而不插入到“2019-01-03”中。

INSERT OVERWRITE TABLE db_t.students PARTITION(dt='2019-01-02', dt='2019-01-03')
SELECT id, name, marks FROM db_t.students WHERE dt='2019-01-01';

最佳答案

将您的所有数据与所需日期范围内的日历日期交叉联接。使用动态分区:

set hivevar:start_date=2019-01-02; 
set hivevar:end_date=2019-01-31;

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

with date_range as
(--this query generates date range
select date_add ('${hivevar:start_date}',s.i) as dt
from ( select posexplode(split(space(datediff('${hivevar:end_date}','${hivevar:start_date}')),' ')) as (i,x) ) s
)

INSERT OVERWRITE TABLE db_t.students PARTITION(dt)
SELECT id, name, marks, r.dt --partition column is the last one
FROM db_t.students s
CROSS JOIN date_range r
WHERE s.dt='2019-01-01'
DISTRIBUTE BY r.dt;

另一种可能的解决方案是使用 hadoop fs -cphadoop distcp 复制分区数据(对每个分区重复或在 shell 中使用循环):

hadoop fs -cp '/usr/warehouse/students/dt=2019-01-01' '/usr/warehouse/students/dt=2019-01-02'

还有一个使用 UNION ALL 的解决方案:

    set hive.exec.dynamic.partition=true; 
set hive.exec.dynamic.partition.mode=nonstrict;

INSERT OVERWRITE TABLE db_t.students PARTITION(dt)
SELECT id, name, marks, '2019-01-02' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
UNION ALL
SELECT id, name, marks, '2019-01-03' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
UNION ALL
SELECT id, name, marks, '2019-01-04' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
UNION ALL
...
;

关于hive - 如何将一个分区的数据插入/复制到配置单元中的多个分区?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56071235/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com