gpt4 book ai didi

hadoop - Hive 写入分区和 DEFAULT

转载 作者:可可西里 更新时间:2023-11-01 16:41:10 26 4
gpt4 key购买 nike

我有一个日期时间表、一个事实表和一个分区表。目标是将 date_time 加入事实并插入到分区表中。它正在工作,因为我可以验证 /apps/hive/warehouse/dbname.db/p_tbl/p_year=2016/p_month=01/p_day=01 以及其他几个年、月、日文件夹是当前的。但是,我也有 p_year=__HIVE_DEFAULT_PARTITION__/p_month=__HIVE_DEFAULT_PARTITION__/p_day=__HIVE_DEFAULT_PARTITION__ 充满了应该有自己的分区的数据。为什么某些年、月、日分区有效,而对于其他日期,它们被写入 HIVE_DEFAULT?是的,我为我的帖子重命名为通用名称,而不是真正的 db、tbl、col 名称...

CREATE EXTERNAL TABLE ${DB_NAME_HIVE}.date_time(
date_time_key bigint,
label_yyyy varchar(32),
label_mm varchar(32),
label_dd varchar(32)
)
row format delimited
fields terminated by ','
lines terminated by '\n'
LOCATION '${TARGET_PATH}date_time'
;

CREATE EXTERNAL TABLE ${DB_NAME_HIVE}.p_table(
date_time_key bigint,
column_0 varchar(50),
column_1 char(32),
column_2 timestamp,
column_3 bigint,
column_4 tinyint
)
PARTITIONED BY(p_year string, p_month string, p_day string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS ORC
LOCATION '${TARGET_PATH}my_table'
;


USE ${DB_NAME_HIVE};
INSERT INTO TABLE ${DB_NAME_HIVE}.p_tbl PARTITION(p_year, p_month, p_day)
SELECT
mt.date_time_key,
mt.col_0,
mt.col_1,
mt.col_2,
mt.col_3,
mt.col_4,
dt.label_yyyy,
dt.label_mm,
dt.label_dd
FROM my_tbl mt
LEFT OUTER JOIN date_time dt on (mt.date_time_key = dt.date_time_key)
;

INFO : Time taken for load dynamic partitions : 12557
INFO : Loading partition {p_year=2016, p_month=11, p_day=15}
INFO : Loading partition {p_year=2016, p_month=07, p_day=20}
INFO : Loading partition {p_year=2014, p_month=08, p_day=06}
INFO : Loading partition {p_year=2015, p_month=04, p_day=05}
INFO : Loading partition {p_year=2012, p_month=10, p_day=31}
INFO : Loading partition {p_year=2014, p_month=04, p_day=16}
INFO : Loading partition {p_year=2014, p_month=09, p_day=06}
INFO : Loading partition {p_year=2014, p_month=09, p_day=09}
INFO : Loading partition {p_year=2012, p_month=11, p_day=22}
INFO : Loading partition {p_year=2014, p_month=11, p_day=19}
INFO : Loading partition {p_year=2014, p_month=09, p_day=03}
INFO : Loading partition {p_year=2013, p_month=12, p_day=24}
INFO : Loading partition {p_year=2014, p_month=10, p_day=29}
INFO : Loading partition {p_year=2015, p_month=09, p_day=04}
INFO : Loading partition {p_year=2015, p_month=05, p_day=11}
INFO : Loading partition {p_year=2016, p_month=06, p_day=13}
INFO : Loading partition {p_year=2014, p_month=09, p_day=24}
INFO : Loading partition {p_year=2014, p_month=10, p_day=21}
INFO : Loading partition {p_year=2016, p_month=01, p_day=06}
INFO : Loading partition {p_year=2014, p_month=11, p_day=05}
INFO : Loading partition {p_year=2012, p_month=12, p_day=04}
INFO : Loading partition {p_year=2016, p_month=11, p_day=25}
INFO : Loading partition {p_year=2014, p_month=10, p_day=13}
INFO : Loading partition {p_year=2013, p_month=06, p_day=21}
INFO : Loading partition {p_year=2013, p_month=06, p_day=27}
INFO : Loading partition {p_year=2014, p_month=10, p_day=22}
INFO : Loading partition {p_year=2016, p_month=11, p_day=22}
INFO : Loading partition {p_year=2012, p_month=10, p_day=26}
INFO : Loading partition {p_year=2014, p_month=08, p_day=28}
INFO : Loading partition {p_year=2013, p_month=10, p_day=21}
INFO : Loading partition {p_year=2014, p_month=09, p_day=04}
INFO : Loading partition {p_year=2013, p_month=11, p_day=14}
INFO : Loading partition {p_year=2013, p_month=10, p_day=22}
INFO : Loading partition {p_year=2014, p_month=08, p_day=26}
INFO : Loading partition {p_year=2012, p_month=10, p_day=30}
INFO : Loading partition {p_year=2014, p_month=05, p_day=23}
INFO : Loading partition {p_year=2012, p_month=10, p_day=29}
INFO : Loading partition {p_year=2014, p_month=09, p_day=02}
INFO : Loading partition {p_year=__HIVE_DEFAULT_PARTITION__, p_month=__HIVE_DEFAULT_PARTITION__, p_day=__HIVE_DEFAULT_PARTITION__}
INFO : Loading partition {p_year=2014, p_month=09, p_day=05}
INFO : Loading partition {p_year=2016, p_month=07, p_day=25}
INFO : Loading partition {p_year=2016, p_month=08, p_day=10}

最佳答案

在 Hive 中使用动态分区,如果存在空值,则将它们写入 Hive 默认分区。因此,为了解决此问题,您可以清理源数据以填充或删除空值。

关于hadoop - Hive 写入分区和 DEFAULT,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40894027/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com