java - Hadoop-MapReduce中的调试。映射器没有被调用？-6ren

java - Hadoop-MapReduce中的调试。映射器没有被调用？

转载作者：行者123 更新时间：2023-12-02 21:38:58

过去几天，我一直在自学Hadoop，并根据this webpage提供的信息尝试实现基本的BFS算法。我必须做一些修改和补充才能编译代码。我在运行时遇到以下错误，即使经过数小时的调试，我仍无法解决。有人可以帮我吗？

错误:

15/05/11 03:04:20 WARN mapred.LocalJobRunner: job_local934121164_0001
java.lang.Exception: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1072)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:715)
    at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
    at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:125)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
15/05/11 03:04:21 INFO mapreduce.Job: Job job_local934121164_0001 running in uber mode : false
15/05/11 03:04:21 INFO mapreduce.Job:  map 0% reduce 0%

正如我在遵循相同的key，value类型的映射器和reducer中那样，这不会发生，如下所示。我认为这里发生的唯一一件事是未使用我的映射器类，而是使用了默认的映射器类(它发出 LongWritable 键)。我不确定我在做什么错。

SearchMapper.java

import java.io.IOException;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.LongWritable;
public class SearchMapper extends Mapper<Object, Text, Text, Text> {

    // Types of the input key, input value and the Context object through which 
    // the Mapper communicates with the Hadoop framework
    public void map(Object key, Text value, Context context, Node inNode)
            throws IOException, InterruptedException {

        // For each GRAY node, emit each of the adjacent nodes as a new node
        // (also GRAY) if the adjacent node is already processed
        // and colored BLACK, the reducer retains the color BLACK
        // Note that the mapper is not differentiating between BLACK GREY AND WHITE

        if (inNode.getColor() == Node.Color.GRAY) {
            for (String neighbor : inNode.getEdges()) { 
                Node adjacentNode = new Node();

                // Remember that the current node only has the value the id 
                // of its neighbour, and not the object itself. Therefore at 
                // this stage there is no way of knowing and assigning any of
                // its other properties. Also remember that the reducer is doing
                // the 'clean up' task and not the mapper.
                adjacentNode.setId(neighbor); 
                adjacentNode.setDistance(inNode.getDistance() + 1);
                adjacentNode.setColor(Node.Color.GRAY);
                adjacentNode.setParent(inNode.getId());
                context.write(new Text(adjacentNode.getId()), adjacentNode.getNodeInfo()); // Get nodeinfo returns a Text Object
            }
            inNode.setColor(Node.Color.BLACK);
        }
        // Emit the input node, other wise the BLACK color change(if it happens)
        // Wont be persistent 
        context.write(new Text(inNode.getId()), inNode.getNodeInfo());

    }
}

SearchReducer.java

import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.io.Text;
import java.io.IOException;

public class SearchReducer extends Reducer<Text, Text, Text, Text> {

    // Types of the input key, the values associated with the key, the Context object for Reducers communication
    // with the Hadoop framework and the node whose information has to be output
    // the return type is a Node
    public Node reduce(Text key, Iterable<Text> values, Context context, Node outNode)
            throws IOException, InterruptedException {

        // set the node id as the key
        outNode.setId(key.toString());

        // TODO : (huh?) Since the values are of the type Iterable, iterate through the values associated with the key
        // for all the values corresponding to a particular node id

        for (Text value : values) {

            Node inNode = new Node(key.toString() + "\t" + value.toString());

            // Emit one node after combining all the mapper outputs

            // Only one node(the original) will have a non-null adjascency list
            if (inNode.getEdges().size() > 0) {
                outNode.setEdges(inNode.getEdges());
            }

            // Save the minimum distance and parent
            if (inNode.getDistance() < outNode.getDistance()) {
                outNode.setDistance(inNode.getDistance());
                outNode.setParent(inNode.getParent());
            }

            // Save the darkest color
            if (inNode.getColor().ordinal() > outNode.getColor().ordinal()) {
                outNode.setColor(inNode.getColor());
            }        
        }
        context.write(key, new Text(outNode.getNodeInfo()));      
        return outNode;
    }
}

BaseJob.java (网站提到的通用类，基本上可以完成工作)

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.io.Text;
import java.io.IOException;

public abstract class BaseJob extends Configured implements Tool {

        protected Job setupJob(String jobName,JobInfo jobInfo) throws Exception {

        Job job = new Job(new Configuration(), jobName);
        job.setJarByClass(jobInfo.getJarByClass());

        job.setMapperClass(jobInfo.getMapperClass());
        if (jobInfo.getCombinerClass() != null)
            job.setCombinerClass(jobInfo.getCombinerClass());
        job.setReducerClass(jobInfo.getReducerClass());

        // TODO : set number of reducers as required
        job.setNumReduceTasks(3);

        job.setOutputKeyClass(jobInfo.getOutputKeyClass());
        job.setOutputValueClass(jobInfo.getOutputValueClass());
       /*
        job.setJarByClass(SSSPJob.class);
        job.setMapperClass(SearchMapper.class);
        job.setReducerClass(SearchReducer.class);
        job.setNumReduceTasks(3);
        job.setOutputValueClass(Text.class);
        job.setOutputKeyClass(Text.class);*/
        return job;
    }

   // Implement an abstract class for JobInfo object
    protected abstract class JobInfo {
        public abstract Class<?> getJarByClass();
        public abstract Class<? extends Mapper> getMapperClass();
        public abstract Class<? extends Reducer> getCombinerClass();
        public abstract Class<? extends Reducer> getReducerClass();
        public abstract Class<?> getOutputKeyClass();
        public abstract Class<?> getOutputValueClass();

    }
}

SSSPJob.java (驱动程序)

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counters;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.fs.Path;

public class SSSPJob extends BaseJob {
    // method to set the configuration for the job and the mapper and the reducer classes
    private Job getJobConf(String[] args) 
        throws Exception {

    // Defining the abstract class objects
        JobInfo jobInfo = new JobInfo() {
            @Override
            public Class<? extends Reducer> getCombinerClass() {
                return null;
            }

            @Override
            public Class<?> getJarByClass() {
                return SSSPJob.class;
            }

            @Override
            public Class<? extends Mapper> getMapperClass() {
                return SearchMapper.class;
            }

            @Override
            public Class<?> getOutputKeyClass() {
                return Text.class;
            }

            @Override
            public Class<?> getOutputValueClass() {
                return Text.class;
            }

            @Override
            public Class<? extends Reducer> getReducerClass() {
                return SearchReducer.class;
            }
        };

        return setupJob("ssspjob", jobInfo);

    }

    // the driver to execute the job and invoke the map/reduce functions

    public int run(String[] args) throws Exception {
        int iterationCount = 0; 
        Job job;
        // No of grey nodes
        long terminationValue = 1;

        while( terminationValue >0){
            job = getJobConf(args); 
            String input, output;

            // Setting the input file and output file for each iteration
            // During the first time the user-specified file will be the
            // input whereas for the subsequent iterations
            // the output of the previous iteration will be the input
            // NOTE: Please be clear of how the input output files are set
            //       before proceding.

            // for the first iteration the input will be the first input argument
            if (iterationCount == 0) 
                input = args[0];
            else
                // for the remaining iterations, the input will be the output of the previous iteration
                input = args[1] + iterationCount;

            output = args[1] + (iterationCount + 1);

            FileInputFormat.setInputPaths(job, new Path(input));
            FileOutputFormat.setOutputPath(job, new Path(output));

            job.waitForCompletion(true); 

            Counters jobCntrs = job.getCounters();
            terminationValue = jobCntrs.findCounter(MoreIterations.numberOfIterations).getValue();
            // if the counter's value is incremented in the reducer(s), then there are more
            // GRAY nodes to process implying that the iteration has to be continued.
            iterationCount++;
        }
        return 0;
    }

    public static void main(String[] args) throws Exception {

        int res = ToolRunner.run(new Configuration(), new SSSPJob(), args);
        if(args.length != 2){
            System.err.println("Usage: <in> <output name> ");
            System.exit(1);
            System.out.println("Huh?");
        }
        System.exit(res);
    }

}

而且，我不确定在hadoop上如何进行调试。我所有的调试打印语句似乎都没有作用，我怀疑hadoop框架会将日志消息写入其他位置或文件。
谢谢 :)

最佳答案

MR作业中的键应实现WritableComparable，而值应实现Writable。我认为您的映射器代码使用的是“对象”类型的实例。
只需在 map 之前添加@Override批注并减少方法，以便它们显示错误。
否则，您不会看到任何错误，但是由于签名不匹配，因此将调用默认的IdentityMapper，从而导致错误。
如果要处理文本文件，则map方法的键应为LongWritable，如果要使用自定义键，则应实现WritableComparable。

关于java - Hadoop-MapReduce中的调试。映射器没有被调用？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30157077/

文章推荐： hadoop - Hadoop NameNode无法触发DataNode

文章推荐： php - 字符类中的单词

文章推荐： scroll - "Disable horizontal scroll on Apple MBP multi touch trackpad"

文章推荐： scala - 如何在 Scalding 中定义源字段

java - 调试 "Step into"类调用导致 "Class not found"异常 - Eclipse 调试
昨晚我因为这个问题脑子崩溃了。在确保没有来 self 的 eclipse 错误检查的明显错误之后，我开始调试我的程序。顺便说一下，我正在使用 Jre7。无论如何，每次我进入我的类调用(我们称之为“a”
gcc - 是否可以使用 lldb 调试 gcc 编译的程序，或使用 gdb 调试 clang 编译的程序？
(前言:我对 C/C++ 还很陌生，我真的不知道 native 代码中的调试实际上是如何工作的。) 一些消息来源说 gdb 和 lldb 可以调试 any program compiled to ma
debugging - 调试 T4Scaffolding.Core Nuget 包时的 Set-DefaultScaffolder : Cannot get an instance of EnvDTE. DTE - 高级 Powershell 调试
我正在尝试从 Visual Studio 2012 外部调试 T4Scaffolding.Core Nuget 包。我使用的是安装了 Powershell 3.0 的 Powershell ISE，并
调试 assembly
如何调试汇编代码？我在 Linux 上使用 gdb。我知道我可以看寄存器。有哪些调试汇编代码的方法？最佳答案您当然可以使用 breakpoints就像 C 或任何其他编译语言一样。 This ar
Haskell 调试
如何在每次通话时打印列表或 haskell 中的内容，例如: funct a list = funct (a + 1) (a : list) print list her
调试 Makefile
让我用我对 Makefiles 或 make 知之甚少的评论作为这个问题的前缀。有一个非常大的项目，每晚自动构建。它以 Debug 和 Release 模式构建，Debug 用于 Valgrind
计算周薪和加类费的C程序不起作用？调试
我正在创建一个计算每周工资的程序，那么任何加类工资都是该周正常工资的 1.5 倍。我的代码如下: #include int main() { double payrate; double h
JavaScript 调试
我使用的是 Visual Studio 2010 Express Developer 版本。开发网站。我在我的 .aspx 页面中使用 JavaScript。如何在 Javascript 中放置断点
Javascript 调试
我最近开始修补 Project Euler 问题，并尝试用 Javascript 解决它们。这样做我往往会产生许多无限循环，现在我想知道是否有比终止 Firefox 或 Chrome 中的选项卡更好的
Python实时编码/调试
有没有办法在程序执行期间生成一个交互式 python 控制台(最好是 iPython)而不暂停主程序并且能够检查和修改程序变量？类似于浏览器为 JavaScript 提供的功能。我知道 pdb.se
android - 调试 FFmpeg
我正在使用 FFmpeg @ Android 并希望能够进入 FFmpeg 代码(Eclipse + Seqouya)，同时编译 FFmpeg 我使用 --disable-stripping --en
.net - 调试.net进程时如何找到LastError的值？
我从使用互操作调用 win32 api 函数的 .net 进程中得到一个异常。我有一个调试器，我想查看 LastError 的值。是否可以从 Visual Studio 调试器中查看 LastEr
vba - Navigate2 调试
我正在尝试通过 VBA 创建一个宏，以在 IE 的多个选项卡中打开一组指定的链接。目前我正在使用下面的代码，如果我试图打开 3 个或更少的选项卡，它大部分时间都可以工作。任何超过 3 的代码都会在“N
debugging - MonoDevelop 调试？
好的，这似乎是一个愚蠢的问题，因为 MonoDevelop 越来越成熟，所以我确定我只是想念它，但我环顾四周，所有关于这个主题的问题似乎都是关于远程调试或 Mac 上的调试。我使用的是 Ubuntu
调试 littler/Rscripts
如何调试 Rscripts是从命令行运行的？我目前正在使用 getopt传递命令行选项的包，当有错误时，我很难: 看看到底出了什么问题；在 R 中交互式调试(因为脚本需要命令行选项。) 有没有人有
Adobe ExtendScript 调试
支持 PDF 和网络上的信息很少。我碰巧在博客中看到一篇文章，提到 $.write() 或 $.writeln() 将向 javascript 控制台写入一个字符串。相当有用。有谁知道这个 $ 对象是
PyCharm JavaScript 调试
PyCharm 1.5 中是否可以使用 Firefox 和 Chrome 支持的 JavaScript 调试？如果是这样，它能否与 Python/Django 调试器一起有效运行？如果没有，有没有
debugging - 调试 Release模式应用程序的最佳方法
我确定这以前发生在人们身上，某些东西在 Debug模式下工作，你在发布时编译，但有些东西坏了。这发生在我在嵌入式 XP 环境中工作时，我发现最好的方法确实是编写一个日志文件来确定它会出错的地方。您
php - Flow3 调试
我目前正在为即将到来的项目评估 Flow3。 AOP 模式和依赖注入(inject)将非常适合我们的目的。现在我想不通的是如何在 Controller Action 中调试一些结果。 public
Django Gunicorn 调试
最初，我有一个包含测试服务器的 Django 应用程序。要调试此设置，我只需添加 import pdb; pdb.set_trace()代码中的任何位置，并且有一个断点将我扔到终端中的交互式调试器中(

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - Hadoop-MapReduce中的调试。映射器没有被调用？