java - 如何在 Hadoop MapReduce 中实现组合器？-6ren

java - 如何在 Hadoop MapReduce 中实现组合器？

转载作者：可可西里更新时间：2023-11-01 15:02:51

我知道为了在 Hadoop MapReduce 中包含一个组合器，包含了以下行(我已经完成了)；

   conf.setCombinerClass(MyReducer.class);

我不明白的是，我实际上在哪里实现组合器的功能。我是否在 MyReducer 下创建一个 combine{} 方法？比如reduce方法；

  public void reduce(Text key, Iterator<IntWritable> values,
  OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { }

非常感谢!

最佳答案

一个 Combiner 应该只是一个 Reducer，因此实现了 Reducer 接口(interface)(没有 Combiner界面)。将组合步骤视为 Mapper 和 Reducer 之间的一种中间减少步骤。

以字数统计为例。来自 Yahoo's tutorial :

Word count is a prime example for where a Combiner is useful. The Word Count program in listings 1--3 emits a (word, 1) pair for every instance of every word it sees. So if the same document contains the word "cat" 3 times, the pair ("cat", 1) is emitted three times; all of these are then sent to the Reducer. By using a Combiner, these can be condensed into a single ("cat", 3) pair to be sent to the Reducer. Now each node only sends a single value to the reducer for each word -- drastically reducing the total bandwidth required for the shuffle process, and speeding up the job. The best part of all is that we do not need to write any additional code to take advantage of this! If a reduce function is both commutative and associative, then it can be used as a Combiner as well.

希望对您有所帮助。

关于java - 如何在 Hadoop MapReduce 中实现组合器？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22379379/