gpt4 book ai didi

hadoop - 组合器和分区器的区别

转载 作者:可可西里 更新时间:2023-11-01 14:15:43 25 4
gpt4 key购买 nike

我是 MapReduce 的新手,我无法弄清楚分区器和组合器的区别。我知道两者都在 map 和 reduce 任务之间的中间步骤中运行,并且都减少了 reduce 任务要处理的数据量。请举例说明区别。

最佳答案

首先,同意@Binary nerd 的评论

Combiner can be viewed as mini-reducers in the map phase. They perform a local-reduce on the mapper results before they are distributed further. Once the Combiner functionality is executed, it is then passed on to the Reducer for further work.

where as Partitioner come into the picture when we are working on more than one Reducer. So, the partitioner decide which reducer is responsible for a particular key. They basically take the Mapper Result(if Combiner is used then Combiner Result) and send it to the responsible Reducer based on the key

使用 Combiner 和 Partitioner 场景: enter image description here

只有分区程序的情况:

enter image description here

例子:

  • Combiner Example

  • 分区程序示例:

    The partitioning phase takes place after the map phase and before the reduce phase. The number of partitions is equal to the number of reducers. The data gets partitioned across the reducers according to the partitioning function . The difference between a partitioner and a combiner is that the partitioner divides the data according to the number of reducers so that all the data in a single partition gets executed by a single reducer. However, the combiner functions similar to the reducer and processes the data in each partition. The combiner is an optimization to the reducer. The default partitioning function is the hash partitioning function where the hashing is done on the key. However it might be useful to partition the data according to some other function of the key or the value. -- Source

关于hadoop - 组合器和分区器的区别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38562889/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com