gpt4 book ai didi

sorting - 在具有零化简节点的 Mapreduce 中实现简单排序程序时出错

转载 作者:可可西里 更新时间:2023-11-01 16:27:19 25 4
gpt4 key购买 nike

我尝试在 mapreduce 中实现一个排序程序,以便在 map 阶段之后我只有排序后的输出,其中排序由 hadoop 框架在内部完成。为此,我尝试将 reduce task 的数量设置为零,因为不需要任何减少。现在,当我尝试执行该程序时,我一直在获取校验和错误.. 我不知道接下来要做什么。当然可以在我的上网本上运行该程序,因为当我将 reduce 任务设置为一个时,排序工作正常。请帮忙!!


以下是我为执行排序而编写的完整代码,供您引用:

    /*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/

/**
*
* @author root
*/
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.io.*;
import java.util.*;
import java.io.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.util.*;
import org.apache.hadoop.conf.*;


public class word extends Configured implements Tool
{
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
{
private static IntWritable one=new IntWritable(1);
private Text word=new Text();

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter report) throws IOException
{
String line=value.toString();
StringTokenizer token=new StringTokenizer(line," .,?!");
String wordToken=null;

while(token.hasMoreTokens())
{
wordToken=token.nextToken();
output.collect(new Text(wordToken), one);

}
}

}

public int run(String args[])throws Exception
{
//Configuration conf=getConf();
JobConf job=new JobConf(word.class);
job.setInputFormat(TextInputFormat.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setOutputFormat(TextOutputFormat.class);
job.setMapperClass(Map.class);
job.setNumReduceTasks(0);

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

JobClient.runJob(job);

return 0;
}

public static void main(String args[])throws Exception
{
int exitCode=ToolRunner.run(new word(), args);
System.exit(exitCode);

}
}

这是我在执行这个程序时得到的校验和错误:

12/03/25 10:26:42 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
12/03/25 10:26:43 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/25 10:26:43 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/25 10:26:44 INFO mapred.FileInputFormat: Total input paths to process : 1
12/03/25 10:26:45 INFO mapred.JobClient: Running job: job_local_0001
12/03/25 10:26:45 INFO mapred.FileInputFormat: Total input paths to process : 1
12/03/25 10:26:45 INFO mapred.MapTask: numReduceTasks: 0
12/03/25 10:26:45 INFO fs.FSInputChecker: Found checksum error: b[0, 26]=610a630a620a640a650a740a790a780a730a670a7a0a680a730a
org.apache.hadoop.fs.ChecksumException: Checksum error: file:/root/NetBeansProjects/projectAll/output/regionMulti/individual/part-00000 at 0
at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:136)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:40)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/25 10:26:45 WARN mapred.LocalJobRunner: job_local_0001
org.apache.hadoop.fs.ChecksumException: Checksum error: file:/root/NetBeansProjects/projectAll/output/regionMulti/individual/part-00000 at 0
at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:136)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:40)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/25 10:26:46 INFO mapred.JobClient: map 0% reduce 0%
12/03/25 10:26:46 INFO mapred.JobClient: Job complete: job_local_0001
12/03/25 10:26:46 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at sortLog.run(sortLog.java:59)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at sortLog.main(sortLog.java:66)
Java Result: 1
BUILD SUCCESSFUL (total time: 4 seconds)

最佳答案

所以看看 org.apache.hadoop.mapred.MapTask around line 600 in 0.20.2。

  // get an output object
if (job.getNumReduceTasks() == 0) {
output =
new NewDirectOutputCollector(taskContext, job, umbilical, reporter);
} else {
output = new NewOutputCollector(taskContext, job, umbilical, reporter);
}

如果将 reduce 任务的数量设置为零,它将直接写入输出。 NewOutputCollector 将使用所谓的 MapOutputBuffer 执行溢出、排序、组合和分区。

因此,如果您没有设置 reducer,则不会发生任何排序,即使 Tom White 在权威指南中指出了这一点。

关于sorting - 在具有零化简节点的 Mapreduce 中实现简单排序程序时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9849682/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com