java - Hadoop:使用不同的Mappers处理不同的文件，并使用Custom Writable在Reducer中合并结果-6ren

java - Hadoop:使用不同的Mappers处理不同的文件，并使用Custom Writable在Reducer中合并结果

转载作者：行者123 更新时间：2023-12-02 21:41:28

25

4

我正在学习Hadoop。
我有2个Mappers都处理不同的文件，还有1个Reducer结合了这两个Mappers的输入。

输入:
文件1:

1,Abc
2,Mno
3,Xyz

文件2:

1,CS
2,EE
3,CS

预期产量:

1   1,Abc,CS
2   2,Mno,EE
3   3,Xyz,CS

获取输出:

1   1,,CS
2   2,Mno,
3   3,Xyz,

我的代码:

对应器1:

public class NameMapper extends MapReduceBase implements
        Mapper<LongWritable, Text, LongWritable, UserWritable> {

    @Override
    public void map(LongWritable key, Text value,
            OutputCollector<LongWritable, UserWritable> output, Reporter reporter)
            throws IOException {

        String val[] = value.toString().split(",");

        LongWritable id = new LongWritable(Long.parseLong(val[0]));
        Text name = new Text(val[1]);

        output.collect(id, new UserWritable(id, name, new Text("")));
    }
}

映射器2:

public class DepartmentMapper extends MapReduceBase implements
        Mapper<LongWritable, Text, LongWritable, UserWritable> {

    @Override
    public void map(LongWritable key, Text value,
            OutputCollector<LongWritable, UserWritable> output, Reporter reporter)
            throws IOException {

        String val[] = value.toString().split(",");

        LongWritable id = new LongWritable(Integer.parseInt(val[0]));
        Text department = new Text(val[1]);

        output.collect(id, new UserWritable(id, new Text(""), department));
    }
}

reducer :

public class JoinReducer extends MapReduceBase implements
        Reducer<LongWritable, UserWritable, LongWritable, UserWritable> {

    @Override
    public void reduce(LongWritable key, Iterator<UserWritable> values,
            OutputCollector<LongWritable, UserWritable> output,
            Reporter reporter) throws IOException {

        UserWritable user = new UserWritable();

        while (values.hasNext()) {

            UserWritable u = values.next();

            user.setId(u.getId());

            if (!(u.getName().equals(""))) {
                user.setName(u.getName());
            }

            if (!(u.getDepartment().equals(""))) {
                user.setDepartment(u.getDepartment());
            }
        }
        output.collect(user.getId(), user);
    }
}

司机:

public class Driver extends Configured implements Tool {

    public int run(String[] args) throws Exception {

        JobConf conf = new JobConf(getConf(), Driver.class);
        conf.setJobName("File Join");

        conf.setOutputKeyClass(LongWritable.class);
        conf.setOutputValueClass(UserWritable.class);

        conf.setReducerClass(JoinReducer.class);

        MultipleInputs.addInputPath(conf, new Path("/user/hadoop/join/f1"),
                TextInputFormat.class, NameMapper.class);

        MultipleInputs.addInputPath(conf, new Path("/user/hadoop/join/f2"),
                TextInputFormat.class, DepartmentMapper.class);

        Path output = new Path("/user/hadoop/join/output");
        FileSystem.get(new URI(output.toString()), conf).delete(output);

        FileOutputFormat.setOutputPath(conf, output);

        JobClient.runJob(conf);

        return 0;
    }

     public static void main(String[] args) throws Exception {
         int result = ToolRunner.run(new Configuration(), new Driver(), args);
         System.exit(result);
     }
}

UserWritable:

public class UserWritable implements Writable {

    private LongWritable id;
    private Text name;
    private Text department;

    public UserWritable() {
    }

    public UserWritable(LongWritable id, Text name, Text department) {
        super();
        this.id = id;
        this.name = name;
        this.department = department;
    }

    public LongWritable getId() {
        return id;
    }

    public void setId(LongWritable id) {
        this.id = id;
    }

    public Text getName() {
        return name;
    }

    public void setName(Text name) {
        this.name = name;
    }

    public Text getDepartment() {
        return department;
    }

    public void setDepartment(Text department) {
        this.department = department;
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        id = new LongWritable(in.readLong());
        name = new Text(in.readUTF());
        department = new Text(in.readUTF());
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeLong(id.get());
        out.writeUTF(name.toString());
        out.writeUTF(department.toString());
    }

    @Override
    public String toString() {
        return id.get() + "," + name.toString() + "," + department.toString();
    }
}

Reducer应该为每个UserId获得2个UserWritable对象；第一个具有ID，名称，第二个具有ID，部门。
谁能解释我在哪里弄错了？

最佳答案

我在代码中发现了问题。

u.getName()

返回Text对象。
u.getName().toString()解决了问题。

关于java - Hadoop:使用不同的Mappers处理不同的文件，并使用Custom Writable在Reducer中合并结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28448173/

25

4

0

文章推荐： hadoop - Impala嵌套数据可用吗？

文章推荐： c# - 如何用 Timer 替换 WHILE

文章推荐： php - htaccess 制作 "pretty"URL

python - Mapper Mapper|用户|用户无法组装映射表的任何主键列 'users'
from sqlalchemy import * from sqlalchemy import create_engine, ForeignKey from sqlalchemy import Col
java - mapred.Mapper 与 mapreduce.Mapper
我使用 MR1 API(hadoop-core-1.2.1.jar) 编写了一个示例字数统计程序。映射器类定义如下， public interface Mapper extends JobConfi
MyBatis-Plus通过插件将数据库表生成Entiry,Mapper.xml,Mapper.class的方式
创建maven项目，修改pom.xml文件，如下： ? 1
hadoop - 如何直接将 mapper-reducer 的输出发送到另一个 mapper-reducer 而无需将输出保存到 hdfs
问题最终得到解决在底部查看我的解决方案最近我正在尝试运行 Mahout in Action 的第 6 章( list 6.1 ~ 6.4)中的推荐系统示例。但是我遇到了一个问题，我已经用谷歌搜索了但
c# - .net 中的对象复制方法 : Auto Mapper, Emit Mapper、隐式操作、属性复制
如果有人知道在 .NET 中执行此操作的更多方法，您对这些方法有何看法？您选择哪种方法，为什么？下面是.NET中对象拷贝不同方式的测试。与此原始线程相关的测试:How to copy value
c# - Mapper.Map(source, dest) 和 Mapper.Map(source) 有什么区别？
我能看出参数个数的不同，但我不知道实现上的不同。每种方法的行为是否存在重要差异？最佳答案第一个填充您传入的现有对象。第二个为您创建一个新对象。这是“项目”和“填充”之间的语义差异。关于c# -
c# - 从 Glass.Mapper.Sitecore 升级到 Glass.Mapper.Sc 时缺少 InstanceContext
我正在将一个项目从 Glass Mapper v2 (Glass.Mapper.Sitecore) 升级到 v4 (Glass.Mapper.Sc)，我遇到了一个问题，我们的解决方案是使用 Insta
hadoop - 使用 org.apache.hadoop.mapred.mapper 接口(interface)实现 "in mapper"设计模式
我正在实现一些 hadoop 应用程序。我的编码部分几乎完成了。但是想在阅读“Lin & Chris Dryer”的映射器设计模式书后改进编码器。至于这种方法的有效实现，需要在 map 函数中保留状态
python - sqlalchemy.exc.InvalidRequestError : One or more mappers failed to initialize - can't proceed with initialization of other mappers 错误
当我尝试访问该页面时发生此错误。我在创建表时没有遇到错误，但似乎仍然存在问题。模型是这样的: class User(UserMixin, db.Model): id = db.Column(
python - SQLAlchemy.exc.UnboundExecutionError : Could not locate a bind configured on mapper Mapper|SellsTable|sellers or this Session 错误
我创建了一个使用 SQLAlchemy 的类: class DbAbsLayer(object): def __init__(self): self.setConnection
asp.net-mvc-4 - 无法解析类型名称 : Glass. Mapper.Sc.Pipelines.Response.GetModel、Glass.Mapper.Sc
我试图在我的 MVC - Sitecore - 7.1 中的 v4.0.30319 项目中使用 Glass Mapper。以下是我安装的 Glass Mapper 版本 Glass Mapper 版
hadoop - 如果我使用 -mapper cat 而不是 -mapper org.apache.hadoop.mapred.lib.IdentityMapper，Hadoop Streaming 的性能会降低吗？
我在尝试使用 org.apache.hadoop.mapred.lib.IdentityMapper 作为 Hadoop Streaming 1.0.3 中 -mapper 的参数时遇到了问题。 “猫
Caused by: java.lang.NoClassDefFoundError: org/mybatis/spring/mapper/MapperScannerConfigurer(原因：java.lang.NoClassDefFoundError：org/mybatis/spring/mapper/MapperScannerConfigurer)
这是我的mybatis配置。这是我的pom.xml。。当我运行项目时，它显示了错误的原因：org/mybatis/spring/mapper/MapperScannerConfigurer.有没有人能
Mapper sql语句字段和实体类属性名字有什么关系
背景： 1.在数据库中有一个通知表可以看到其中的 gmt_create、 notifier_name、 outer_title 这三个字段是有下划线的 2.这张表
hadoop - 将任何类型的对象传递给Hadoop Mapper
hadoop配置对象仅允许在set方法中将字符串作为值 set(字符串名称，字符串值) 是否有一种简单的方法来设置任何其他对象类型？我想在映射器中检索这些对象。我注意到在0.15左右的版本中，有一个
hadoop - 自定义Hadoop Mapper
我要开发的更大目标如下: a)仪表板，除其他功能外，用户还可以上传文档(.pdf，.txt，.doc)。所有这些文档都转到特定目录。 b)用户还可以查询所有带有特定关键字标记的文档。现在，我希望使用
Jhipster生成代码后报错，Mapper could not be found
2016-10-20 18:03:51.253 WARN 17216 --- [restartedMain] .s.c.a.CommonAnnotationBeanPostProcessor:在名为“
Hadoop Mapper 运行缓慢
我正在尝试同时使用映射器和缩减器来运行作业，但映射器运行缓慢.. 如果对于相同的输入我禁用 reducers，映射器将在 3 分钟内完成而对于 mapper-reducer 作业，即使在 30 分钟后
java - 如何将附加数据传递给 Mapper？
由于一些数据在所有 map() 函数之间共享，我无法在 setup() 中生成它们，因为每个 setup() 对应于每个map() 函数，而我想做的是预先生成一些数据并将其存储在可实现的地方，然后在每
java - Mapper 类是在每个作业的基础上初始化的吗？
我正在使用 Hadoop，我想使用静态变量来减少必须进行的方法调用次数。以下是我如何使用静力学: public class Mapper extends Mapper { protected

首页

博学

6Ren·AI

商城

java - Hadoop:使用不同的Mappers处理不同的文件，并使用Custom Writable在Reducer中合并结果