gpt4 book ai didi

hadoop - java.lang.NullPointerException:在Spark Streaming作业中写入 Parquet 文件时,writeSupportClass不应为null

转载 作者:行者123 更新时间:2023-12-02 20:55:51 28 4
gpt4 key购买 nike

在 Spark 流作业中,我将使用以下代码片段将rdd数据保存到Hadoop HDFS的 Parquet 文件中:

readyToSave.foreachRDD((VoidFunction<JavaPairRDD<Void, MyProtoRecord>>) rdd -> {          
Configuration configuration = rdd.context().hadoopConfiguration();
Job job = Job.getInstance(configuration);
ParquetOutputFormat.setWriteSupportClass(job, ProtoWriteSupport.class);
ProtoParquetOutputFormat.setProtobufClass(job, MyProtoRecord.class);
rdd.saveAsNewAPIHadoopFile("path-to-hdfs", Void.class, MyProtoRecord.class, ParquetOutputFormat.class, configuration);
});

我在下面得到异常:
java.lang.NullPointerException: writeSupportClass should not be null
at parquet.Preconditions.checkNotNull(Preconditions.java:38)
at parquet.hadoop.ParquetOutputFormat.getWriteSupport(ParquetOutputFormat.java:326)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:272)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1112)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

我该如何解决这个问题?

最佳答案

发现问题了!
在调用“ saveAsNewAPIHadoopFile()方法时,您可以指定作业的配置( job.getConfiguration()):

readyToSave.foreachRDD((VoidFunction<JavaPairRDD<Void, MyProtoRecord>>) rdd -> {
Configuration configuration = rdd.context().hadoopConfiguration();
Job job = Job.getInstance(configuration);
ParquetOutputFormat.setWriteSupportClass(job, ProtoWriteSupport.class);
ProtoParquetOutputFormat.setProtobufClass(job, MyProtoRecord.class);
rdd.saveAsNewAPIHadoopFile("path-to-hdfs", Void.class, MyProtoRecord.class, ParquetOutputFormat.class, job.getConfiguration());
});

关于hadoop - java.lang.NullPointerException:在Spark Streaming作业中写入 Parquet 文件时,writeSupportClass不应为null,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44542568/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com