gpt4 book ai didi

parquet - Apache 的 Parquet Java API 的文档?

转载 作者:行者123 更新时间:2023-12-04 05:36:40 26 4
gpt4 key购买 nike

我想使用 Apache 的 parquet-mr 项目通过 Java 以编程方式读取/写入 Parquet 文件。我似乎找不到任何有关如何使用此 API 的文档(除了查看源代码并查看它的使用方式)——只是想知道是否存在任何此类文档?

最佳答案

我写了一篇关于读取 Parquet 文件 ( http://www.jofre.de/?p=1459 ) 的博客文章,并提出了以下解决方案,它甚至能够读取 INT96 字段。

您需要以下 Maven 依赖项:

<dependencies>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop</artifactId>
<version>1.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.0</version>
</dependency>
</dependencies>

代码基本上是:

public class Main {

private static Path path = new Path("file:\\C:\\Users\\file.snappy.parquet");

private static void printGroup(Group g) {

int fieldCount = g.getType().getFieldCount();
for (int field = 0; field < fieldCount; field++) {
int valueCount = g.getFieldRepetitionCount(field);

Type fieldType = g.getType().getType(field);
String fieldName = fieldType.getName();

for (int index = 0; index < valueCount; index++) {
if (fieldType.isPrimitive()) {
System.out.println(fieldName + " " + g.getValueToString(field, index));
}
}
}

}

public static void main(String[] args) throws IllegalArgumentException {

Configuration conf = new Configuration();

try {
ParquetMetadata readFooter = ParquetFileReader.readFooter(conf, path, ParquetMetadataConverter.NO_FILTER);
MessageType schema = readFooter.getFileMetaData().getSchema();
ParquetFileReader r = new ParquetFileReader(conf, path, readFooter);

PageReadStore pages = null;
try {
while (null != (pages = r.readNextRowGroup())) {
final long rows = pages.getRowCount();
System.out.println("Number of rows: " + rows);

final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
final RecordReader<Group> recordReader = columnIO.getRecordReader(pages, new GroupRecordConverter(schema));
for (int i = 0; i < rows; i++) {
final Group g = recordReader.read();
printGroup(g);

// TODO Compare to System.out.println(g);
}
}
} finally {
r.close();
}
} catch (IOException e) {
System.out.println("Error reading parquet file.");
e.printStackTrace();
}

}
}

关于parquet - Apache 的 Parquet Java API 的文档?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43744059/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com