gpt4 book ai didi

java - org.apache.parquet.schema.InvalidSchemaException : A group type can not be empty. Parquet 不支持没有叶子的空组

转载 作者:太空宇宙 更新时间:2023-11-04 09:12:33 24 4
gpt4 key购买 nike

我正在尝试将一些记录写入java中的 Parquet 文件中。

以下是我的示例代码:

import org.apache.avro.reflect.ReflectData;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.parquet.avro.AvroParquetWriter;
import org.apache.parquet.hadoop.ParquetWriter;

import java.util.Date;
import java.util.Set;

import static org.apache.parquet.hadoop.ParquetFileWriter.Mode.OVERWRITE;
import static org.apache.parquet.hadoop.metadata.CompressionCodecName.SNAPPY;

public class App {
public static void main(String[] args) {
Path dataFile = new Path("/tmp/UpdateMetaData.snappy.parquet");

try {
ParquetWriter<UpdateMeta> writer = AvroParquetWriter.<UpdateMeta>builder(dataFile)
.withSchema(ReflectData.AllowNull.get().getSchema(UpdateMeta.class))
.withDataModel(ReflectData.get())
.withConf(new Configuration())
.withCompressionCodec(SNAPPY)
.withWriteMode(OVERWRITE)
.build();
} catch (Exception e) {
e.printStackTrace();
}
}
}

class UpdateMeta {
String updatedBy;
Date updatedAt;
Set<EmailContentField> emailContentField;
}

但我遇到以下异常:

org.apache.parquet.schema.InvalidSchemaException: A group type can not be empty. Parquet does not support empty group without leaves. Empty group: updatedAt at org.apache.parquet.schema.GroupType.(GroupType.java:92) at org.apache.parquet.schema.GroupType.(GroupType.java:48) at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:132) at org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:174) at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:151) at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:112) at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:187) at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:106) at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:97) at org.apache.parquet.avro.AvroParquetWriter.writeSupport(AvroParquetWriter.java:144) at org.apache.parquet.avro.AvroParquetWriter.access$100(AvroParquetWriter.java:35) at org.apache.parquet.avro.AvroParquetWriter$Builder.getWriteSupport(AvroParquetWriter.java:173) at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:489) at com.gartner.emailactivityimporter.dao.App.main(App.java:26)

以下是我在 pom 文件中使用的依赖项:

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
<exclusions>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop</artifactId>
<version>1.8.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.parquet/parquet-avro -->
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-avro</artifactId>
<version>1.8.1</version>
</dependency>

请帮我解决这个异常。

谢谢

最佳答案

我们无法使用上述版本的依赖项直接将Date/Timestamp写入 Parquet 。

因此我们需要将Date/Timestamp转换为Stringlong。结果成功了。

如果您有任何其他解决方案或建议,请评论。

谢谢

关于java - org.apache.parquet.schema.InvalidSchemaException : A group type can not be empty. Parquet 不支持没有叶子的空组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59502220/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com