java - Hadoop mapreduce : Driver for chaining mappers within a MapReduce job-6ren

java - Hadoop mapreduce : Driver for chaining mappers within a MapReduce job

转载作者：可可西里更新时间：2023-11-01 14:13:49

我有 mapreduce 工作:我的代码 map 类:

public static class MapClass extends Mapper<Text, Text, Text, LongWritable> {

    @Override
    public void map(Text key, Text value, Context context)
        throws IOException, InterruptedException {
    }
}

我想使用 ChainMapper :

1. Job job = new Job(conf, "Job with chained tasks");
2. job.setJarByClass(MapReduce.class);
3. job.setInputFormatClass(TextInputFormat.class);
4. job.setOutputFormatClass(TextOutputFormat.class);

5. FileInputFormat.setInputPaths(job, new Path(InputFile));
6. FileOutputFormat.setOutputPath(job, new Path(OutputFile));

7. JobConf map1 = new JobConf(false);

8. ChainMapper.addMapper(
        job, 
        MapClass.class, 
        Text.class, 
        Text.class, 
        Text.class, 
        Text.class, 
        true, 
        map1
        );

但它的报告在第 8 行有错误:

Multiple markers at this line - Occurrence of 'addMapper' - The method addMapper(JobConf, Class>, Class, Class, Class, Class, boolean, JobConf) in the type ChainMapper is not applicable for the arguments (Job, Class, Class, Class, Class, Class, boolean, Configuration) - Debug Current Instruction Pointer - The method addMapper(JobConf, Class>, Class, Class, Class, Class, boolean, JobConf) in the type ChainMapper is not applicable for the arguments (JobConf, Class, Class, Class, Class, Class, boolean, JobConf)

最佳答案

经过大量的“功夫”，我能够使用ChainMapper/ChainReducer。感谢 user864846 的最后评论。

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package myPKG;

/* 
 * Ajitsen: Sample program for ChainMapper/ChainReducer. This program is modified version of WordCount example available in Hadoop-0.18.0. Added ChainMapper/ChainReducer and made to works in Hadoop 1.0.2. 
 */

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.mapred.lib.ChainMapper;
import org.apache.hadoop.mapred.lib.ChainReducer;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class ChainWordCount extends Configured implements Tool {

    public static class Tokenizer extends MapReduceBase
    implements Mapper<LongWritable, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(LongWritable key, Text value, 
                OutputCollector<Text, IntWritable> output, 
                Reporter reporter) throws IOException {
            String line = value.toString();
            System.out.println("Line:"+line);
            StringTokenizer itr = new StringTokenizer(line);
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                output.collect(word, one);
            }
        }
    }

    public static class UpperCaser extends MapReduceBase
    implements Mapper<Text, IntWritable, Text, IntWritable> {

        public void map(Text key, IntWritable value, 
                OutputCollector<Text, IntWritable> output, 
                Reporter reporter) throws IOException {
            String word = key.toString().toUpperCase();
            System.out.println("Upper Case:"+word);
            output.collect(new Text(word), value);    
        }
    }

    public static class Reduce extends MapReduceBase
    implements Reducer<Text, IntWritable, Text, IntWritable> {

        public void reduce(Text key, Iterator<IntWritable> values,
                OutputCollector<Text, IntWritable> output, 
                Reporter reporter) throws IOException {
            int sum = 0;
            while (values.hasNext()) {
                sum += values.next().get();
            }
            System.out.println("Word:"+key.toString()+"\tCount:"+sum);
            output.collect(key, new IntWritable(sum));
        }
    }

    static int printUsage() {
        System.out.println("wordcount <input> <output>");
        ToolRunner.printGenericCommandUsage(System.out);
        return -1;
    }

    public int run(String[] args) throws Exception {
        JobConf conf = new JobConf(getConf(), ChainWordCount.class);
        conf.setJobName("wordcount");

        if (args.length != 2) {
            System.out.println("ERROR: Wrong number of parameters: " +
                    args.length + " instead of 2.");
            return printUsage();
        }
        FileInputFormat.setInputPaths(conf, args[0]);
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        JobConf mapAConf = new JobConf(false);
        ChainMapper.addMapper(conf, Tokenizer.class, LongWritable.class, Text.class, Text.class, IntWritable.class, true, mapAConf);

        JobConf mapBConf = new JobConf(false);
        ChainMapper.addMapper(conf, UpperCaser.class, Text.class, IntWritable.class, Text.class, IntWritable.class, true, mapBConf);

        JobConf reduceConf = new JobConf(false);
        ChainReducer.setReducer(conf, Reduce.class, Text.class, IntWritable.class, Text.class, IntWritable.class, true, reduceConf);

        JobClient.runJob(conf);
        return 0;
    }

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new ChainWordCount(), args);
        System.exit(res);
    }
}

编辑在最新版本中(至少从 hadoop 2.6 开始)，不需要 addMapper 中的 true 标志。 (实际上签名有变化抑制它`)。

那就是

JobConf mapAConf = new JobConf(false);
ChainMapper.addMapper(conf, Tokenizer.class, LongWritable.class, Text.class,
                      Text.class, IntWritable.class, mapAConf);

关于java - Hadoop mapreduce : Driver for chaining mappers within a MapReduce job，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6840922/

文章推荐： windows - WriteConsoleW、wprintf 和 Unicode

文章推荐： database - HBase如何保证行级原子性？

文章推荐： java - 如何产生海量数据？

java - driver.get(URL)、driver.navigate().to(URL) 和 driver.navigate().back() 不起作用
我正在尝试获取此亚马逊页面中列出的每台笔记本电脑的图像 URL ( https://www.amazon.com/s?rh=n%3A565108%2Cp_72%3A4-&pf_rd_i=565108&
java - 如何安装 com.mysql.jdbc.Driver "Could not find driver with class name: com.mysql.jdbc.Driver"？
我正在设置 Atlassian Confluence，在选择数据库时，我在选择“使用外部 Mysql 数据库”时卡住了我看过一些教程，但对我来说，它并没有按照应有的方式工作。我使用 ubuntu 12
java - org.neo4j.ogm.exception.ServiceNotFoundException : Driver: org. neo4j.ogm.drivers.http.driver.HttpDriver
我是 Neo4J 的新手，正在尝试通过 java 连接到 Neo4J 服务器。我的一个独立项目的pom入口如下: org.neo4j neo4j-o
apache-spark - spark.local.ip ,spark.driver.host,spark.driver.bindAddress 和 spark.driver.hostname 是什么？
所有这些有什么区别和用途？ spark.local.ip spark.driver.host spark.driver.bind地址 spark.driver.hostname 如何将机器修复为 Sp
driver - Scratch 执行窗口模糊和闪烁 : is video driver faulty?
我在旧的 Inspiron 6400 计算机(GeForce 7300 笔记本电脑版)上安装了 Lubuntu 19.04，通过网络草稿编辑器教我儿子 Scratch。每次我通过 Firefox 打开
android - QSql数据库 : Driver not loaded Driver not loaded
我使用 qt 开发了一个 c++ 库。在本文中，我使用 QSqlDatabase 从 SQLite 数据库中查询信息。注意:我的库在 qt 桌面应用程序中运行良好(我在 Linux 上开发)。现在我
mysql - PDO异常 : Cannot find driver but driver installed
存在类似的问题，但没有帮助。在 Apache 2.4 上安装 php5-fpm 通过 SSL 连接到远程 MySql 数据库。可以通过命令行连接MySQL mysql -u myname -p'p
symfony 4 : An exception occurred in driver: could not find driver
使用以下配置 (doctrine.yaml) 在 Symfony 4 中使用 Doctrine DBAL: dbal: # configure these for your database
symfony 4 : An exception occurred in driver: could not find driver
使用以下配置 (doctrine.yaml) 在 Symfony 4 中使用 Doctrine DBAL: dbal: # configure these for your database
java - Selenium Web Driver 似乎间歇性地跳过 driver.get()
我有一个用 Java 编写的 Selenium Web 驱动程序测试，目标是 Liferay 站点。 // Login driver.get(baseUrl + "/"); driver.findEl
java - driver.findElement() 和 driver.findElements() 有什么区别？
在driver.findElements()中，我们获得了另一个用于查找size()的函数，该函数在driver.findElement()中不可用。这是唯一的区别吗？最佳答案 driver.fi
java - 屏幕上显示错误消息 : The driver executable does not exist:/Project/Driver/chromedriver
这个问题已经有答案了: java.lang.IllegalStateException: The driver executable does not exist: while trying to e
driver - 解决驱动程序开发 : Signed driver not recognized by Windows CodeIntegrity 3004
简短描述:我有一个通过 SignTool 验证的签名驱动程序，但 Windows 拒绝加载它并出现错误:CodeIntegrity 3004 - 在系统上找不到文件哈希。我该如何解决这个问题？详细说
java - geb.driver.DriverCreationException : failed to create driver from callback
我正在设置一些 Geb 测试，但出现“geb.driver.DriverCreationException:无法从回调创建驱动程序”错误。 Geb 将尝试启动测试浏览器窗口，但一旦启动，我的所有测试都
java.lang.IllegalStateException : The driver executable does not exist chrome driver
我想通过应用对象存储库概念在 Chrome 驱动程序中打开 url。下面给出的是我的 selenium 程序，其中包含两个文件，一个是 testng 文件，另一个是 config.property 文
java - 春袋鼠 : JDBC driver not available for 'org.h2.Driver'
我在 Ubuntu Linux、Spring Tools 2.7.1、Spring Roo 1.1.5 上运行 Eclipse Indigo。我正在阅读 Getting Started with Ro
c# - Selenium driver.Url 与 driver.Navigate().GoToUrl()
打开 Url 的首选方法是什么(它们之间是否存在任何差异): driver.Url = "http://example.com"; 或 driver.Navigate().GoToUrl("http:
python - cassandra-driver 执行查询时，cassandra-driver 返回错误 OperationTimedOut
我使用 python 脚本传递给 cassandra 批处理查询，如下所示: query = 'BEGIN BATCH ' + 'insert into ... ; insert into ... ;
macos - 使用 Protractor 运行脚本时获取 "Driver info: driver.version: unknown"
我在 Protractor 中执行脚本时出现以下错误。 System info: host: '8888', ip: '88888', os.name: 'Mac OS X', os.arch: 'x
python - KeyError : 'driver' in print(response. request.meta ['driver' ].title)
我收到错误 KeyError:'driver'。我想使用scrapy-selenium 创建一个网络爬虫。我的代码如下所示: class TestSpider(Spider): name="test

可可西里

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - Hadoop mapreduce : Driver for chaining mappers within a MapReduce job