gpt4 book ai didi

hadoop - mapreduce 作业的链接

转载 作者:可可西里 更新时间:2023-11-01 14:42:34 24 4
gpt4 key购买 nike

我遇到了“mapreduce 作业的链接”。作为 mapreduce 的新手,在什么情况下我们必须链接(我假设链接意味着依次运行 mapreduce 作业)作业?

有什么可以提供帮助的例子吗?

最佳答案

必须链接的作业的经典示例是字数统计,它输出按频率排序的字词。

你需要:

工作 1:

  • 输入源映射器(发出单词作为键,一个作为值)
  • 聚合缩减器(聚合字数)

工作 2:

  • 键/值交换映射器(将频率作为键,词作为值)
  • implicit identity reducer(获取按频率排序的词,不必实现)

这是上面映射器/缩减器的例子:

public class HadoopWordCount {


public static class TokenizerMapper extends Mapper<Object, Text, Text, LongWritable> {

private final static Text word = new Text();
private final static LongWritable one = new LongWritable(1);

public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class KeyValueSwappingMapper extends Mapper<Text, LongWritable, LongWritable, Text> {

public void map(Text key, LongWritable value, Context context) throws IOException, InterruptedException {
context.write(value, key);
}
}

public static class SumReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
private LongWritable result = new LongWritable();

public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException,
InterruptedException {
long sum = 0;
for (LongWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

这里是驱动程序的例子。

它需要两个参数:

  1. 一个输入文本文件,用于计算其中的单词数。
  2. 输出目录(不应预先存在)- 在{this dir}/out2/part-r-0000 文件中查找输出
public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
Path out = new Path(args[1]);

Job job1 = Job.getInstance(conf, "word count");
job1.setJarByClass(HadoopWordCount.class);
job1.setMapperClass(TokenizerMapper.class);
job1.setCombinerClass(SumReducer.class);
job1.setReducerClass(SumReducer.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(LongWritable.class);
job1.setOutputFormatClass(SequenceFileOutputFormat.class);
FileInputFormat.addInputPath(job1, new Path(args[0]));
FileOutputFormat.setOutputPath(job1, new Path(out, "out1"));
if (!job1.waitForCompletion(true)) {
System.exit(1);
}
Job job2 = Job.getInstance(conf, "sort by frequency");
job2.setJarByClass(HadoopWordCount.class);
job2.setMapperClass(KeyValueSwappingMapper.class);
job2.setNumReduceTasks(1);
job2.setSortComparatorClass(LongWritable.DecreasingComparator.class);
job2.setOutputKeyClass(LongWritable.class);
job2.setOutputValueClass(Text.class);
job2.setInputFormatClass(SequenceFileInputFormat.class);
FileInputFormat.addInputPath(job2, new Path(out, "out1"));
FileOutputFormat.setOutputPath(job2, new Path(out, "out2"));
if (!job2.waitForCompletion(true)) {
System.exit(1);
}

}

关于hadoop - mapreduce 作业的链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38111700/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com