gpt4 book ai didi

java - 执行 mapreduce 程序时出错

转载 作者:可可西里 更新时间:2023-11-01 16:28:28 27 4
gpt4 key购买 nike

我是 java 和 mapreduce 的新手。我已经编写了 mapreduce 程序来执行 wordcount。我面临以下错误。

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at mapreduce.mrunit.Wordcount.main(Wordcount.java:63)

第63行代码是:

FileInputFormat.setInputPaths(job, new Path(args[0]));

下面是我写的代码:

package mapreduce.mrunit;
import java.util.StringTokenizer;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Wordcount {
public static class Map extends
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();

@SuppressWarnings("deprecation")
Job job = new Job(conf, "wordcount");

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);

// job.setInputFormatClass(TextInputFormat.class);
// job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.waitForCompletion(true);
}
}

我无法修复错误。请帮我解决这个错误。

最佳答案

错误在 main() 方法的下面一行:

FileInputFormat.setInputPaths(job, new Path(args[0]));

来自Javadoc , 当

Thrown to indicate that an array has been accessed with an illegal index. The index is either negative or greater than or equal to the size of the array.

意思是main()方法参数数组args的长度缺少元素。

根据您的程序,您期望它包含 2 个元素

第一个元素 args[0] 是输入路径。

第二个元素 args[1] 是输出路径。

请创建一个输入目录并放入一个包含一些行的文本文件。请注意,您不应创建输出目录(您最多可以创建父目录)。 MapReduce 会自动创建它。

所以,假设你的路径是

inputPath = /user/cloudera/wordcount/input
outputPath = /user/cloudera/wordcount

然后像这样执行程序

hadoop jar wordcount.jar mapreduce.mrunit.Wordcount /user/cloudera/wordcount/input /user/cloudera/wordcount/output

请注意,我在程序的第二个参数中添加了 output 文件夹,以遵守输出路径不应存在的限制,它将由程序在运行时创建。

最后建议关注this tutorial ,其中包含执行 WordCount 程序的分步指令。

关于java - 执行 mapreduce 程序时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49301936/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com