gpt4 book ai didi

java - 将 CSV 转换为 ORC 时出现异常

转载 作者:可可西里 更新时间:2023-11-01 14:36:50 24 4
gpt4 key购买 nike

我正在尝试编写一个 mapreduce 程序,该程序将输入作为 CSV 并写入为 ORC 格式,但面临 NullPointerException 异常。

下面是我得到的异常堆栈跟踪

java.lang.Exception: java.lang.NullPointerException
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.createTreeWriter(WriterImpl.java:1584)

at org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:176)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:369)
at org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.write(OrcNewOutputFormat.java:51)
at org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.write(OrcNewOutputFormat.java:37)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106)
at ORCMapper.map(ORCMapper.java:22)
at ORCMapper.map(ORCMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

以下是mapreduce作业的代码

/**驱动代码**/

public class RunORC extends Configured implements Tool {


public static void main(String[] args) throws Exception {

int res = ToolRunner.run(new Configuration(), new RunORC(), args);
System.exit(res);

}

public int run(String[] arg) throws Exception {
Configuration conf=getConf();

//Set ORC configuration parameters
conf.set("orc.create.index", "true");


Job job = Job.getInstance(conf);
job.setJarByClass(RunORC.class);
job.setJobName("ORC Output");


job.setMapperClass(ORCMapper.class);
// job.setReducerClass(OrcReducer.class);
//job.setNumReduceTasks(Integer.parseInt(arg[2]));
job.setNumReduceTasks(0);

job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(Writable.class);

// job.setOutputKeyClass(NullWritable.class);
// job.setOutputValueClass(Writable.class);
job.setOutputFormatClass(OrcNewOutputFormat.class);



FileInputFormat.addInputPath(job, new Path(arg[0]));
Path output=new Path(arg[1]);

// OrcNewOutputFormat.setCompressOutput(job,true);
OrcNewOutputFormat.setOutputPath(job,output);

return job.waitForCompletion(true) ? 0: 1;
}


}

/** 映射器代码**/

    public class ORCMapper extends Mapper<LongWritable,Text,NullWritable,Writable> {

private final OrcSerde serde = new OrcSerde();

@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

Writable row = serde.serialize(value, null);
context.write(NullWritable.get(), row);
}
//}
}

最佳答案

您使用空第二个参数调用 OrcSerde.serialize 方法,我敢打赌这就是原因。看这里例如: http://hadoopcraft.blogspot.com/2014/07/generating-orc-files-using-mapreduce.html

关于java - 将 CSV 转换为 ORC 时出现异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36418983/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com