gpt4 book ai didi

java - 如何在 Hadoop MapReduce 中实现组合器?

转载 作者:可可西里 更新时间:2023-11-01 15:02:51 25 4
gpt4 key购买 nike

我知道为了在 Hadoop MapReduce 中包含一个组合器,包含了以下行(我已经完成了);

   conf.setCombinerClass(MyReducer.class);

我不明白的是,我实际上在哪里实现组合器的功能。我是否在 MyReducer 下创建一个 combine{} 方法?比如reduce方法;

  public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { }

非常感谢!

最佳答案

一个 Combiner 应该只是一个 Reducer,因此实现了 Reducer 接口(interface)(没有 Combiner界面)。将组合步骤视为 MapperReducer 之间的一种中间减少步骤。

以字数统计为例。来自 Yahoo's tutorial :

Word count is a prime example for where a Combiner is useful. The Word Count program in listings 1--3 emits a (word, 1) pair for every instance of every word it sees. So if the same document contains the word "cat" 3 times, the pair ("cat", 1) is emitted three times; all of these are then sent to the Reducer. By using a Combiner, these can be condensed into a single ("cat", 3) pair to be sent to the Reducer. Now each node only sends a single value to the reducer for each word -- drastically reducing the total bandwidth required for the shuffle process, and speeding up the job. The best part of all is that we do not need to write any additional code to take advantage of this! If a reduce function is both commutative and associative, then it can be used as a Combiner as well.

希望对您有所帮助。

关于java - 如何在 Hadoop MapReduce 中实现组合器?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22379379/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com