gpt4 book ai didi

scala - saveAsNewAPIHadoopFile() 在用作输出格式时出错

转载 作者:行者123 更新时间:2023-12-02 03:28:18 30 4
gpt4 key购买 nike

我正在 Spark 中运行 teragen 程序的修改版本,该程序是用 Scala 编写的。我正在尝试使用函数 saveAsNewAPIHadoopFile() 保存输出文件。相关代码如下:

dataset.map(row => (NullWritable.get(), new BytesWritable(row))).saveAsNewAPIHadoopFile(output)

代码编译成功。但是,在运行它时,出现以下错误:

Exception in thread "main" java.lang.RuntimeException: class scala.runtime.Nothing$ not org.apache.hadoop.mapreduce.OutputFormat
at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:1794)
at org.apache.hadoop.mapreduce.Job.setOutputFormatClass(Job.java:823)
at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:830)
at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:811)
at GenSort$.main(GenSort.scala:52)
at GenSort.main(GenSort.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

有没有办法让它与 saveAsNewAPIHadoopFile() 一起工作?我很乐意提供任何帮助。

最佳答案

saveAsNewAPIHadoopFile 需要键、值、格式化类。

方法签名是:

saveAsNewAPIHadoopFile(path: String,suffix: String, 
keyClass: Class[_],
valueClass: Class[_],
outputFormatClass: Class[_ <: org.apache.hadoop.mapreduce.OutputFormat[_, _]])

实现应该是:

dataset.map(row => (NullWritable.get(), new BytesWritable(row))).saveAsNewAPIHadoopFile("hdfs:\\.....","<suffix>",classOf[NullWritable],classOf[BytesWritable],classOf[org.apache.hadoop.mapreduce.lib.output.TextOutputFormat[NullWritable, BytesWritable]]))

dataset.map(row => (NullWritable.get(), new BytesWritable(row))).
saveAsNewAPIHadoopFile("hdfs:\\.....","<suffix>",
new NullWritable().getClass,new BytesWritable.getClass,
new org.apache.hadoop.mapreduce.lib.output.TextOutputFormat[NullWritable, BytesWritable].getClass))

关于scala - saveAsNewAPIHadoopFile() 在用作输出格式时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29007085/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com