gpt4 book ai didi

hadoop - 无法在Hive中插入JSON文件的数据

转载 作者:行者123 更新时间:2023-12-02 20:24:50 26 4
gpt4 key购买 nike

失败了!
为以下架构创建表

(schema = {"type":"record","name":"topLevelRecord","fields":[{"name":"MESSAGE_ID","type":["string","null"]},{"name":"MSGNAME","type":["string","null"]},{"name":"SOURCE","type":["string","null"]},{"name":"EVENT_DATETIME","type":["string","null"]},{"name":"CUSTOMER_ORDER_ID","type":["string","null"]},{"name":"SP_ORGANISATION_NAME","type":["string","null"]},{"name":"CUSTOMER_ACCOUNT_ID","type":["string","null"]},{"name":"ORDER_TYPE_NAME","type":["string","null"]},{"name":"ORDER_SUBTYPE_NAME","type":["string","null"]},{"name":"ORDER_REASON_NAME","type":["string","null"]},{"name":"ORDER_CREATED_DATE","type":["string","null"]},{"name":"ORDER_CREATED_CHANNEL_NAME","type":["string","null"]},{"name":"ORDER_CREATED_RETAILER_ID","type":["string","null"]},{"name":"ORDER_CREATED_DEALER_ID","type":["string","null"]},{"name":"ORDER_CREATED_AFFILIATE_ID","type":["string","null"]},{"name":"ORDER_CREATED_EMPLOYEE_ID","type":["string","null"]},{"name":"ORDER_CREATED_CONTACT_CENTRE_AGENT_ID","type":["string","null"]},{"name":"ORDER_SUBMITTED_DATE","type":["string","null"]},{"name":"ORDER_SUBMITTED_CHANNEL_NAME","type":["string","null"]},{"name":"ORDER_DUE_DATE","type":["string","null"]},{"name":"ONE_TIME_CHARGE_AMT","type":["string","null"]},{"name":"RECURRING_CHARGE_AMT","type":["string","null"]},{"name":"ORDER_STATUS_NAME","type":["string","null"]},{"name":"ORDER_STATUS_CHANGE_REASON_NAME","type":["string","null"]},{"name":"CREATE_JOB_RUN_ID","type":"int"},{"name":"CREATE_DATE_TIME","type":"string"},{"name":"SYSTEM_ID","type":"int"},{"name":"SRC_FILE_NAME","type":"string"}]}

我是 hive 的新手,只是环顾四周就尝试了一下,并提出了以下查询
CREATE EXTERNAL TABLE governed_data.customer_order(
message_id string,
msgname string,
source string,
event_datetime string,
customer_order_id string,
sp_organisation_name string,
customer_account_id string,
order_type_name string,
order_subtype_name string,
order_reason_name string,
order_created_date string,
order_created_channel_name string,
order_created_retailer_id string,
order_created_dealer_id string,
order_created_affiliate_id string,
order_created_employee_id string,
order_created_contact_centre_agent_id string,
order_submitted_date string,
order_submitted_channel_name string,
order_due_date string,
one_time_charge_amt string,
recurring_charge_amt string,
order_status_name string,
order_status_change_reason_name string,
create_job_run_id int,
create_date_time string,
system_id int,
src_file_name string)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS AVRO
location 'adl://rbsitbinsighstdlt001.azuredatalakestore.net/insights/governed_data/';

在我想在 hive 数据库中插入数据

最佳答案

您指定存储为AVRO,而serde为JsonSerde,这些属性存在冲突。

如果需要AVRO,则将serde指定为org.apache.hadoop.hive.serde2.avro.AvroSerDe,将inputformat指定为org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat,并将outputformat指定为org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat。还提供一个位置,AvroSerde将从该位置提取表的最新架构。

在此处查看示例:Creating Avro-backed Hive tables

或者简单地指定STORED AS AVRO,而不使用SerDe,Input和Output格式。尝试删除DDL中的ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'

而且,如果您希望JsonSerDe解析属性,则创建如下表:

CREATE EXTERNAL TABLE governed_data.customer_order(message_id string,
msgname string,
source string,
event_datetime string,
customer_order_id string,
sp_organisation_name string,
customer_account_id string,
order_type_name string,
order_subtype_name string,
order_reason_name string,
order_created_date string,
order_created_channel_name string,
order_created_retailer_id string,
order_created_dealer_id string,
order_created_affiliate_id string,
order_created_employee_id string,
order_created_contact_centre_agent_id string,
order_submitted_date string,
order_submitted_channel_name string,
order_due_date string,
one_time_charge_amt string,
recurring_charge_amt string,
order_status_name string,
order_status_change_reason_name string,
create_job_run_id int,
create_date_time string,
system_id int,
src_file_name string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 'adl://rbsitbinsighstdlt001.azuredatalakestore.net/insights/governed_data/'
;

另请参阅有关 JsonSerDe的文档

关于hadoop - 无法在Hive中插入JSON文件的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57557079/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com