gpt4 book ai didi

hadoop - Hive中 'Stored as InputFormat, OutputFormat'和 'Stored as'的区别

转载 作者:可可西里 更新时间:2023-11-01 14:13:34 24 4
gpt4 key购买 nike

如果表是 ORC,则执行 show create table 然后执行生成的 create table 语句时出现问题。

使用 show create table,您会得到:

STORED AS INPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’

但是,如果您使用这些子句创建表,则在选择时会出现转换错误。错误喜欢:

Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.BinaryComparable


要解决此问题,只需将 create table 语句更改为 STORED AS ORC

但是,正如类似问题中的答案所说: What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive? .

我想不通原因。

最佳答案

STORED AS 意味着 3 件事:

  1. 服务器端
  2. 输入格式
  3. 输出格式

您只定义了最后 2 个,留下 SERDE 由 hive.default.serde 定义

hive.default.serde
Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Added in: Hive 0.14 with HIVE-5976
The default SerDe Hive will use for storage formats that do not specify a SerDe.
Storage formats that currently do not specify a SerDe include 'TextFile, RcFile'.

演示

hive.default.serde

set hive.default.serde;

hive.default.serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

存储为 ORC

create table mytable (i int) 
stored as orc;

show create table mytable;

请注意,SERDE 是 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'

CREATE TABLE `mytable`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/home/cloudera/local_db/mytable'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982059')

存储为输入格式...输出格式...

create table mytable2 (i int) 
STORED AS
INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;

show create table mytable2
;

请注意,SERDE 是 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

CREATE TABLE `mytable2`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/home/cloudera/local_db/mytable2'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982426')

关于hadoop - Hive中 'Stored as InputFormat, OutputFormat'和 'Stored as'的区别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44443697/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com