gpt4 book ai didi

java - Spark saveAsNewAPIHadoopFile java.io.IOException : Could not find a serializer for the Value class

转载 作者:可可西里 更新时间:2023-11-01 15:27:04 31 4
gpt4 key购买 nike

我正在尝试将 java 对 RDD 存储为 Hadoop 序列文件,如下所示:

JavaPairRDD<ImmutableBytesWritable, Put> putRdd = ...
config.set("io.serializations","org.apache.hadoop.io.serializer.JavaSerialization,org.apache.hadoop.io.serializer.WritableSerialization");
putRdd.saveAsNewAPIHadoopFile(outputPath, ImmutableBytesWritable.class, Put.class, SequenceFileOutputFormat.class, config);

但即使我正在设置 io.serializations,我也会遇到异常:

2017-04-06 14:39:32,623 ERROR [Executor task launch worker-0] executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.io.IOException: Could not find a serializer for the Value class: 'org.apache.hadoop.hbase.client.Put'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization.
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1192)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1094)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1030)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1014)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-04-06 14:39:32,669 ERROR [task-result-getter-0] scheduler.TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job

知道如何解决这个问题吗?

最佳答案

我找到了修复,显然 Put(和所有 HBase 突变)都有一个特定的序列化程序 Mutation Serialization

以下行解决了这个问题:

config.setStrings("io.serializations", 
config.get("io.serializations"),
MutationSerialization.class.getName(),
ResultSerialization.class.getName());

关于java - Spark saveAsNewAPIHadoopFile java.io.IOException : Could not find a serializer for the Value class,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43255698/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com