gpt4 book ai didi

hive - "Parquet record is malformed"而列数不为 0

转载 作者:行者123 更新时间:2023-12-04 00:59:34 25 4
gpt4 key购买 nike

在 AWS EMR 集群上,我尝试使用 Pyspark 将查询结果写入 parquet,但遇到以下错误:

Caused by: java.lang.RuntimeException: Parquet record is malformed: empty fields are illegal, the field should be ommited completely instead
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111)
at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124)
at org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149)
at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:137)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:245)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248)
... 10 more

我读过如果有一些列带有 null values only 就会发生这种情况。 ,但在检查了所有列数后,情况并非如此。没有任何列是完全空的。我没有使用 Parquet ,而是尝试将结果写入文本文件,一切都很顺利。

任何可能触发此错误的线索?以下是此表中使用的所有数据类型。共有 51 列。
'array<bigint>',
'array<char(50)>',
'array<smallint>',
'array<string>',
'array<varchar(100)>',
'array<varchar(50)>',
'bigint',
'char(16)',
'char(20)',
'char(4)',
'int',
'string',
'timestamp',
'varchar(255)',
'varchar(50)',
'varchar(87)'

最佳答案

结果 Parquet 不支持空数组。如果表中有一个或多个空数组(任何类型),则会触发此错误。

一种解决方法是将空数组转换为 NULL 值。

关于hive - "Parquet record is malformed"而列数不为 0,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59674324/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com