gpt4 book ai didi

java - MapReduce 设计模式中的 Mapper 类和 Reducer 类

转载 作者:可可西里 更新时间:2023-11-01 15:54:19 30 4
gpt4 key购买 nike

我是 MapReduce 的新手,我对这段代码中 Mapper 类和 Reducer 类的设计有一些疑问

我熟悉 MapReduce 中的 Map Side Joining,我了解到:

public static class CustsMapper extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

在这里,在上面的代码片段中我了解到我们将类扩展到 Mapper类和作为Object是一把 key ,Text是一个值,因此 map 方法将此键值作为 context 的输入对象在这里作为输出提供帮助 context.write(new Text(), new Text())按照代码逻辑体的设计。

我的两个问题是:

  1. 为什么我们将类(class)扩展到 MapReduceBase (它做了什么?)以及为什么我们将我们的类(class)实现为 Mapper (我知道这是一个类,但在网络某处显示为界面,所以如果我将其扩展到 org.apache.hadoop.mapreduce.Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>类有什么问题?

  2. map功能是什么 OutputCollector<Text, IntWritable> output, Reporter reporter ??我不知道吗?我知道Context
    context
    应该在这里但什么是OutputCollectorReporter在这里?

我正在执行下面给出的这个程序:

输入:

1979   23   23   2   43   24   25   26   26   26   26   25   26  25 
1980 26 27 28 28 28 30 31 31 31 30 30 30 29
1981 31 32 32 32 33 34 35 36 36 34 34 34 34
1984 39 38 39 39 39 41 42 43 40 39 38 38 40
1985 38 39 39 39 39 41 41 41 00 40 39 39 45

代码:

package hadoop; 

import java.util.*;

import java.io.IOException;
import java.io.IOException;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class ProcessUnits
{
//Mapper class
public static class E_EMapper extends MapReduceBase implements Mapper<LongWritable ,Text,Text,IntWritable>
{
//Map function
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
{
String line = value.toString();
String lasttoken = null;
StringTokenizer s = new StringTokenizer(line,"\t");
String year = s.nextToken();

while(s.hasMoreTokens())
{
lasttoken=s.nextToken();
}

int avgprice = Integer.parseInt(lasttoken);
output.collect(new Text(year), new IntWritable(avgprice));
}
}


//Reducer class
public static class E_EReduce extends MapReduceBase implements Reducer< Text, IntWritable, Text, IntWritable>
{
//Reduce function
public void reduce( Text key, Iterator <IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
{
int maxavg=30;
int val=Integer.MIN_VALUE;

while (values.hasNext())
{
if((val=values.next().get())>maxavg)
{
output.collect(key, new IntWritable(val));
}
}

}
}


//Main function
public static void main(String args[])throws Exception
{
JobConf conf = new JobConf(ProcessUnits.class);

conf.setJobName("max_eletricityunits");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(E_EMapper.class);
conf.setCombinerClass(E_EReduce.class);
conf.setReducerClass(E_EReduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}
}

输出:

1981    34 
1984 40
1985 45

最佳答案

Why we have extended our class to MapReduceBase(what it does?) and why we have implemented our class to Mapper

因为这是在 Hadoop 2.x 出现之前使用 mapred API 编写的旧代码。

I knew that Context context should be here but what is OutputCollector and Reporter here

这是上下文对象的先前版本。

Hadoop: How does OutputCollector work during MapReduce?
How outputcollector works?

关于java - MapReduce 设计模式中的 Mapper 类和 Reducer 类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45896261/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com