gpt4 book ai didi

java - 并行MapReduce

转载 作者:行者123 更新时间:2023-12-02 13:17:23 24 4
gpt4 key购买 nike

我是并行编程和 Hadoop MapReduce 的新手。以下示例取自教程网站。

https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm

如何将MapReduce(应用并行编程)与Mapper和Reducer并行化,以便它们可以一起运行以及如何为其引入多线程?

是否可以在一台机器上运行Mapper并在另一台机器上同时运行Reducer?

如果我没能很好地解释,请道歉。

 package hadoop; 

import java.util.*;

import java.io.IOException;
import java.io.IOException;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class ProcessUnits
{
//Mapper class
public static class E_EMapper extends MapReduceBase implements
Mapper<LongWritable ,/*Input key Type */
Text, /*Input value Type*/
Text, /*Output key Type*/
IntWritable> /*Output value Type*/
{

//Map function
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException
{
String line = value.toString();
String lasttoken = null;
StringTokenizer s = new StringTokenizer(line,"\t");
String year = s.nextToken();

while(s.hasMoreTokens())
{
lasttoken=s.nextToken();
}

int avgprice = Integer.parseInt(lasttoken);
output.collect(new Text(year), new IntWritable(avgprice));
}
}


//Reducer class
public static class E_EReduce extends MapReduceBase implements
Reducer< Text, IntWritable, Text, IntWritable >
{

//Reduce function
public void reduce( Text key, Iterator <IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException
{
int maxavg=30;
int val=Integer.MIN_VALUE;

while (values.hasNext())
{
if((val=values.next().get())>maxavg)
{
output.collect(key, new IntWritable(val));
}
}

}
}


//Main function
public static void main(String args[])throws Exception
{
JobConf conf = new JobConf(ProcessUnits.class);

conf.setJobName("max_eletricityunits");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(E_EMapper.class);
conf.setCombinerClass(E_EReduce.class);
conf.setReducerClass(E_EReduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}
}

最佳答案

Hadoop 将为您负责并行化工作;除了运行 hadoop jar 之外,您不需要做任何事情。

关于一般的mapreduce,您应该记住map阶段和reduce阶段是顺序发生的(不是并行),因为reduce 取决于 map 的结果。但是,您可以让多个映射器并行运行,一旦这些完成,可以并行运行多个缩减器(当然取决于任务)。同样,hadoop 将负责为您启动和协调这些内容。

mapreduce phases

关于java - 并行MapReduce,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43717924/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com