gpt4 book ai didi

json - 如何将Hive查询结果以json格式存储在文件中?

转载 作者:行者123 更新时间:2023-12-02 18:57:46 24 4
gpt4 key购买 nike

我想将配置单元查询结果存储到JSON格式的文件中。通过Brickhouse jar,我可以获得JSON格式的查询输出,但是无法将其存储在文件或表中。我正在尝试查询如下。当INSERT OVERWRITE查询运行时,它会给出错误;我该如何解决这个错误?有没有一种方法可以通过查询以JSON格式存储查询结果?

查询:

add jar hdfs:///mydir/brickhouse-0.7.1.jar;

INSERT OVERWRITE DIRECTORY '/mydir/textfile1'
stored as textfile
SELECT to_json( named_struct( "id",id,
"name",name))
FROM link_tbl;

错误:
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: INSERT OVERWRITE DIRECTORY '/mydir/text...pl(Stage-1)
INFO :

INFO : Status: Running (Executing on YARN cluster with App id application_1571318954298_0001)

INFO : Map 1: -/-
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1571318954298_0001_1_00, diagnostics=[Vertex vertex_1571318954298_0001_1_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:70)
at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:151)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4536)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$4300(VertexImpl.java:202)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:3352)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3301)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3282)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1862)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:201)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1978)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1964)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
... 25 more
Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://sandbox.hortonworks.com:8020/tmp/hive/hive/2eaf13cf-1f98-4a2d-8f76-4e9c839f355b/hive_2019-10-17_13-33-05_763_197979924455130156-2/hive/_tez_scratch_dir/d9d1df72-f68c-4c1f-b642-85a46f32a79f/map.xml: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 19963874, Size: 113
Serialization trace:
_mainHash (org.codehaus.jackson.sym.BytesToNameCanonicalizer)
_rootByteSymbols (org.codehaus.jackson.JsonFactory)
jsonFactory (brickhouse.udf.json.ToJsonUDF)
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:472)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:311)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:101)
... 30 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 19963874, Size: 113
Serialization trace:
_mainHash (org.codehaus.jackson.sym.BytesToNameCanonicalizer)
_rootByteSymbols (org.codehaus.jackson.JsonFactory)
jsonFactory (brickhouse.udf.json.ToJsonUDF)
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:745)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1173)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:1062)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:1076)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:432)
... 32 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 19963874, Size: 113
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:820)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:743)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
... 65 more
]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

最佳答案

解决方案可以在此目录的顶部创建表,并使用JSONSerDe的功能。

创建表:

CREATE EXTERNAL TABLE mydirectory_tbl(
id string,
name string
)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/mydir' --this is HDFS/S3 location
;

插入数据:
INSERT OVERWRITE table mydirectory_tbl
SELECT id,name
FROM link_tbl;

而且您不能在表或目录位置指定文件名。仅目录。如果只需要一个文件,则可以稍后连接文件(最好是性能更高的文件),也可以通过添加 ORDER BY id来强制单个化简器。

关于json - 如何将Hive查询结果以json格式存储在文件中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58434042/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com