gpt4 book ai didi

java - Hadoop 中的 Map Reduce 流程

转载 作者:可可西里 更新时间:2023-11-01 14:54:17 24 4
gpt4 key购买 nike

我正在使用 Hadoop in Practice 一书学习 Hadoop,在阅读第 1 章时,我看到了这个图表:

enter image description here

来自 Hadoop 文档:( http://hadoop.apache.org/docs/current2/api/org/apache/hadoop/mapred/Reducer.html )

1.随机播放

Reducer is input the grouped output of a Mapper. In the phase the framework, for each Reducer, fetches the relevant partition of the output of all the Mappers, via HTTP.

2.排序

The framework groups Reducer inputs by keys (since different Mappers may have output the same key) in this stage. The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.

虽然我知道 shufflesorting 同时发生,但我不清楚框架如何决定哪个 reducer 接收哪个映射器 输出。从文档中,似乎每个 reducer 都有办法知道要收集哪个 map 输出,但我不明白如何。

所以我的问题是,鉴于上面的映射器输出,每个 reducer 的最终结果总是相同的吗?如果是这样,实现这一结果的步骤是什么?

感谢任何澄清!

最佳答案

它是 Partitioner这决定了如何将映射器的输出分配给不同的缩减器。

Partitioner controls the partitioning of the keys of the intermediate map-outputs. The key (or a subset of the key) is used to derive the partition, typically by a hash function. The total number of partitions is the same as the number of reduce tasks for the job. Hence this controls which of the m reduce tasks the intermediate key (and hence the record) is sent for reduction.

关于java - Hadoop 中的 Map Reduce 流程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20916258/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com