gpt4 book ai didi

hadoop - 从AVSC创建Hive表,其中包含对先前定义的架构的引用作为一种类型

转载 作者:行者123 更新时间:2023-12-02 20:46:25 24 4
gpt4 key购买 nike

我正在寻找一种通过Hive获取以下AVSC文件内容并外部化嵌套模式“RENTALRECORDTYPE”的方法,以实现模式重用。

{
"type": "record",
"name": "EMPLOYEE",
"namespace": "",
"doc": "EMPLOYEE is a person that works here",
"fields": [
{
"name": "RENTALRECORD",
"type": {
"type": "record",
"name": "RENTALRECORDTYPE",
"namespace": "",
"doc": "Rental record is a record that is kept on every item rented",
"fields": [
{
"name": "due_date",
"doc": "The date when item is due",
"type": "int"
}
]
}
},
{
"name": "hire_date",
"doc": "Employee date of hire",
"type": "int"
}
]
}

这种定义架构的方法很好用。我可以发出以下HiveQL语句,并且表已成功创建。
CREATE EXTERNAL TABLE employee
STORED AS AVRO
LOCATION '/user/dtom/store/data/employee'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema/employee.avsc');

但是,我希望能够引用现有架构,而不是在多个架构中复制记录定义。例如,将生成两个AVSC文件,而不是单个模式文件。即rentalrecord.avsc和employee.avsc。

rentalrecord.avsc
{
"type": "record",
"name": "RENTALRECORD",
"namespace": "",
"doc": "A record that is kept for every rental",
"fields": [
{
"name": "due_date",
"doc": "The date on which the rental is due back to the store",
"type": "int"
}
]
}

employee.avsc
{
"type": "record",
"name": "EMPLOYEE",
"namespace": "",
"doc": "EMPLOYEE is a person that works for the VIDEO STORE",
"fields": [
{
"name": "rentalrecord",
"doc": "A rental record is a record on every rental",
"type": "RENTALRECORD"
},
{
"name": "hire_date",
"doc": "Employee date of hire",
"type": "int"
}
]
}

在上述情况下,我们希望能够外部化 RENTALRECORD 模式定义,并能够在 employee.avsc 和其他地方重用它。

尝试使用以下两个HiveQL语句导入架构时,它将失败…
CREATE EXTERNAL TABLE rentalrecord
STORED AS AVRO
LOCATION '/user/dtom/store/data/rentalrecord'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema /rentalrecord.avsc');

CREATE EXTERNAL TABLE employee
STORED AS AVRO
LOCATION '/user/dtom/store/data/employee'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema/employee.avsc');

Rentalrecord.avsc已成功导入,但是employee.avsc在第一个字段定义上失败。类型为“RENTALRECORD”的字段。 Hive输出以下错误……

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Encountered exception determining schema. Returning signal schema to indicate problem: "RENTALRECORD" is not a defined name. The type of the "rentalrecord" field must be a defined name or a {"type": ...} expression.)



我的研究告诉我,Avro文件确实支持这种形式的模式重用。所以我丢失了某些东西,或者这是Hive不支持的东西。

任何帮助将不胜感激。

最佳答案

我已经定义了带有所有引用的AVDL,然后使用带有idl2schemata选项的avro工具jar文件来生成avsc。生成的avsc像 hive 一样吸引人!!

关于hadoop - 从AVSC创建Hive表,其中包含对先前定义的架构的引用作为一种类型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47836004/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com