hadoop - MapReduce 旧 API - 将命令行参数传递给 map-6ren

hadoop - MapReduce 旧 API - 将命令行参数传递给 map

转载作者：可可西里更新时间：2023-11-01 16:59:37

我正在编写一个 MapReduce 作业，用于使用旧 API 在存储在 HDFS 中的输入文件中查找搜索字符串(通过命令行参数传递)的出现。

下面是我的驱动类 -

public class StringSearchDriver
{

    public static void main(String[] args) throws IOException
    {
        JobConf jc = new JobConf(StringSearchDriver.class);
        jc.set("SearchWord", args[2]);
        jc.setJobName("String Search");
        FileInputFormat.addInputPath(jc, new Path(args[0]));
        FileOutputFormat.setOutputPath(jc, new Path(args[1]));
        jc.setMapperClass(StringSearchMap.class);
        jc.setReducerClass(StringSearchReduce.class);
        jc.setOutputKeyClass(Text.class);
        jc.setOutputValueClass(IntWritable.class);
        JobClient.runJob(jc);
    }
}

下面是我的 Mapper 类 -

public class StringSearchMap extends MapReduceBase implements
        Mapper<LongWritable, Text, Text, IntWritable>
{
    String searchWord;

    public void configure(JobConf jc)
    {
        searchWord = jc.get("SearchWord");

    }



    @Override
    public void map(LongWritable key, Text value,  
            OutputCollector<Text, IntWritable> out, Reporter reporter)
            throws IOException
    {
        String[] input = value.toString().split("");

        for(String word:input)
        {
            if (word.equalsIgnoreCase(searchWord))
                out.collect(new Text(word), new IntWritable(1));
        }
    }

}

在运行作业时(传递的命令行字符串是“hi”)，出现以下错误 -

14/09/21 22:35:41 INFO mapred.JobClient: Task Id : attempt_201409212134_0005_m_000001_2, Status : FAILED
java.lang.ClassCastException: interface javax.xml.soap.Text
    at java.lang.Class.asSubclass(Class.java:3129)
    at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:795)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:964)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:422)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

请提出建议。

最佳答案

您自动导入了错误的导入。而不是 import org.apache.hadoop.io.Text 你 import javax.xml.soap.Text

您可以在此 blog 中找到示例错误导入.

一点，最好采用New API

编辑

我使用了新的 API

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/**
 * @author Unmesha sreeveni
 * @Date 23 sep 2014
 */
public class StringSearchDriver extends Configured implements Tool {
    public static class Map extends
    Mapper<LongWritable, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            Configuration conf = context.getConfiguration();
            String line = value.toString();
            String searchString = conf.get("word");
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                String token = tokenizer.nextToken();
                if(token.equals(searchString)){
                    word.set(token);
                    context.write(word, one);
                }

            }
        }
    }

    public static class Reduce extends
    Reducer<Text, IntWritable, Text, IntWritable> {

        public void reduce(Text key, Iterable<IntWritable> values,
                Context context) throws IOException, InterruptedException {

            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            context.write(key, new IntWritable(sum));
        }
    }
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        int res = ToolRunner.run(conf, new StringSearchDriver(), args);
        System.exit(res);

    }
    @Override
    public int run(String[] args) throws Exception {
        // TODO Auto-generated method stub
        if (args.length != 3) {
            System.out
            .printf("Usage: Search String <input dir> <output dir> <search word> \n");
            System.exit(-1);
        }

        String source = args[0];
        String dest = args[1];
        String searchword = args[2];
        Configuration conf = new Configuration();
        conf.set("word", searchword);
        Job job = new Job(conf, "Search String");
        job.setJarByClass(StringSearchDriver.class);
        FileSystem fs = FileSystem.get(conf);

        Path in =new Path(source);
        Path out =new Path(dest);
        if (fs.exists(out)) {
            fs.delete(out, true);
        }

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        FileInputFormat.addInputPath(job, in);
        FileOutputFormat.setOutputPath(job, out);
        boolean sucess = job.waitForCompletion(true);
        return (sucess ? 0 : 1);
    }
}

这有效。

关于hadoop - MapReduce 旧 API - 将命令行参数传递给 map ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25962454/

文章推荐： Hadoop 仅使用主节点处理数据

文章推荐： hadoop - 配置 MapReduce 程序以仅在现有程序中运行 reducer

文章推荐： hadoop - nutch查看存储在hbase中的hbase数据的方法

nginx - 旧 SSL 证书在续订后仍在使用
几周前，我安装了一个新的 ssl 证书来替换一个过期的证书。 .crt 和 .key 的文件名保持不变。只是内容变了。现在，当我访问我的网站时，它说证书已过期，我看到它正在使用旧的证书链。我可以确认
php - 旧 PHP 版本中的类命名空间
这个问题在这里已经有了答案: Make namespaces backwards compatible in PHP (4 个回答) 8 年前关闭。我真的很想在我的工作流程中采用命名空间。据我了解，
codenameone - 旧 GUI 中的主题问题
我从 CN1 开始，然后以 Todo App 为例。当我在 Netbeans 中运行 de app 时，只出现一个空白表单，我更改了主题，添加了一个 jpg 图像并使用旧的 GUI Builder 在
Laravel 4 - 单选按钮的输入::旧
我想知道是否有一些 Laravel 人员可以帮忙。我有一个表单，其中有 2 个单选按钮，当表单提交时，它会通过验证器，如果验证器失败，它会返回表单，使用输入填充字段并显示错误消息。我似乎无法对单选
java - 旧 Java 代码中的异常
我正在使用 Java 中没有泛型的旧代码。 hashMap 是在该代码中定义的，没有泛型，例如: Map A = new HashMap(); 在这段代码中，我想如何确定要在键和值中放入什么内容，以及
java - 旧 Java 项目兼容性
我想问一个关于项目兼容性的问题。我的论文项目是在Windows Vista和XP上在JRE 1.6和JDK 1.6下开发的。该项目使用Java新套接字。今天我尝试在 Windows 8(64 位)机器
python - 旧 Plone 产品的自动化蛋化
有谁知道一个脚本可以将旧的 Products 命名空间样式 Plone 附加组件包装到 Egg 中吗？生成setup.py 创建目录结构还需要采取其他措施吗？鸡蛋化过程中有什么陷阱吗？最佳答案
mysql - 旧数据，旧 MySQL
我已经运行 MySQL 5.6.12 一段时间了。我决定将我的 WAMP 服务器更新到最新的 PHP 版本。在此过程中，它还将 MySQL 更新到 5.6.17，保留 .12 数据，但无法访问。如
mysql - 旧 MySQL 代码出现空格错误
最近，我一直在尝试根据互联网上的各种旧教程编写论坛代码，但是我最近遇到了一个问题 - 尽管我完全按照教程所述进行操作，但我收到了空格错误。我认为这可能是因为某些 MySQL 命令可能已更改。如果有人可
php - 旧 MySQL 中的错误查询
我正在创建对 Count 表的查询。 $Month = $_POST['Month']; $query = "SELECT ANY_VALUE(AD) AS ad, COU
旧 mysql 版本的 mysqldump
如果我使用 mysqldump 工具备份旧版本的 MySql 数据库，是否存在任何已知风险？例如，如果我在生产机器上使用 mysqldump 5.6 来备份 MySql 5.X 数据库。最佳答案有
javascript - 旧 JSON 值已更改
当将 columnsData 值分配给 columns 时，我有两个 JSON 对象 columnsData 和 columns，这两个值都会更改。 var columnsData = [
c++ - 旧 C++ 编译器中的防护
我有一个需要在 gcc 4.4 上编译的多线程应用程序，我不允许使用 c++0x 标志。我希望一个变量以原子方式运行，但不幸的是没有 C++0x 标志我无法使用 atomic在 C++ 中。我试过
android - 更改时间后获取上一个(旧)时间
我可以借助广播事件(ACTION_TIME_CHANGED 和 ACTION_DATE_CHANGED)获取时间更改事件。我需要在时间更改后获取之前的时间。例如，当前时间是 10:00。我要把时间改
Android - DatePickerDialog - 旧 API
我正在尝试在我的 Android 应用程序中创建一个 DatePickerDialog，但是当我创建一个 DatePickerDialog 时，我收到以下消息:Call requires API le
php - Laravel 旧()不工作
{!! Form::open(array('route' => 'posts.store', 'data-parsley-validate' => '')) !!} {{ Form::labe
ios - 旧 iOS 设备的蓝牙框架
我的问题与 iOS 周围的蓝牙技术有关。我看过关于蓝牙低功耗 101、新功能、基础知识等的 WWDC，以及关于使用 iOS 5 及更高版本中可用的 CoreBluetooth 框架的内容。我浏览了不同
ios - 旧 View 的线程仍在运行
我有一个有五个屏幕的应用。在每个屏幕上，我在 viewDidLoad 中从服务器访问数据。在每个屏幕上我都有下一个按钮。当我从屏幕一转到屏幕五(通过单击下一步 4 次)时，在 NSLog 中，我
php - 旧 Symfony 应用程序的问题
我最近在一家网络报纸找到了一份工作。在网站上，我们有一个非常古老且重要的 Symfony 应用程序，它是为一位年长的开发人员编写的，已经消失很久了。该应用程序是神圣的:是报纸收入的血液。问题是我们没有
旧 "struct hack"(?) 的符合变体
我相信我已经找到了一种方法来实现类似可移植 C89 中众所周知的“struct hack”的方法。我很好奇这是否真的严格符合 C89。主要思想是:我分配足够大的内存来容纳初始结构和数组元素。确切的大

可可西里

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

hadoop - MapReduce 旧 API - 将命令行参数传递给 map