gpt4 book ai didi

java - 何时应该在 Hadoop 中使用 OutputCollector 和 Context?

转载 作者:行者123 更新时间:2023-12-02 20:49:13 25 4
gpt4 key购买 nike

this文章我找到了这个用于字数统计的映射器代码:

  public static class MapClass extends MapReduceBase
implements Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(word, one);
}
}
}

相反,在 official tutorial这是提供的映射器:
  public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

到目前为止,我只看到了 Context将一些东西从映射器写入 reducer ,我从未见过(或使用过) OutputCollector .我已阅读 documentation ,但我不明白它的使用 key 或为什么要使用它。

最佳答案

两个代码都包含不同的 Map Reduce API。 OutputCollector在 MRV1 和 Context 中在 MRV2 中

The Java Map Reduce API 1 also known as MRV1 was released with initial hadoop versions and the flaw associated with these initial versions was map reduce framework performing both the task of processing and resource management.

Map Reduce 2 or the Next Generation Map Reduce, was a long-awaited and much-needed upgrade to the techniques concerned with scheduling, resource management, and the execution occurring in Hadoop. Fundamentally, the improvements separate cluster resource management capabilities from Map Reduce-specific logic and this separation of processing and resource management were achieved via inception of YARN in later versions of HADOOP.


MRV1 使用 OutputCollecterReporter与 MapReduce 系统进行通信。
MRV2 使用 API 广泛使用 context允许用户代码与 MapReduce 系统通信的对象。 (旧 API 中的 JobConf、OutputCollector 和 Reporter 的角色由 MRV2 中的 Contexts 对象统一)。
使用应该使用 mapreduce 2 (MRV2)。我强调了 hadoop 2 相对于 hadoop 的最大优势:
  • 一个主要优点是,在
    hadoop2架构。我们有 YARN 资源管理器和节点
    而是经理。这有助于 hadoop2 支持除了
    mapreduce 框架来执行代码并克服高延迟
    与 mapreduce 相关的问题。
  • Hadoop2 支持非批处理以及传统批处理
    操作。
  • 在 hadoop2 中引入了 hdfs 联合。这使多个
    namenodes 来控制试图处理单个的 hadoop 集群
    hadoop的点故障问题。

  • MRV2 还有很多优点。 https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/

    关于java - 何时应该在 Hadoop 中使用 OutputCollector 和 Context?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46623041/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com