hadoop - mapreduce 中的默认排序是使用 WritableComparable 类中定义的 Comparator 还是 comapreTo() 方法？-6ren

hadoop - mapreduce 中的默认排序是使用 WritableComparable 类中定义的 Comparator 还是 comapreTo() 方法？

转载作者：可可西里更新时间：2023-11-01 16:56:30

24

4

在输出从 mapper 传递到 reducer 之前，mapreduce 中如何进行排序。如果我的映射器输出键是 IntWritable 类型，它是否使用 IntWritable 类中定义的比较器或类中的 compareTo 方法，如果是，如何进行调用。如果不是如何执行排序，如何进行调用？

最佳答案

Map 作业输出首先被收集，然后发送到 Partitioner，负责确定将数据发送到哪个 Reducer(尽管它尚未通过 reduce() 调用分组)。默认的 Partitioner 使用 Key 的 hashCode() 方法和 Reducer 数量的模数来做到这一点。

之后，将调用 Comparator 对 Map 输出执行排序。流程看起来像这样:

收集器 --> 分区器 --> 溢出 --> 比较器 --> 本地磁盘(HDFS)<-- MapOutputServlet

然后每个 Reducer 将从分区器分配给它的映射器复制数据，并将其传递给 Grouper，Grouper 将确定如何为单个 Reducer 函数调用对记录进行分组:

MapOutputServlet --> Copy to Local Disk (HDFS) --> Group --> Reduce

在函数调用之前，记录还将经过排序阶段以确定它们以何种顺序到达 reducer。 Sorter (WritableComparator()) 将调用 Key 的 compareTo() (WritableComparable() 接口(interface))方法。

为了给您一个更好的主意，下面是您将如何为自定义组合键实现基本的 compareTo()、grouper 和 sorter:

public class CompositeKey implements WritableComparable<CompositeKey> {
    IntWritable primaryField = new IntWritable();
    IntWritable secondaryField = new IntWritable();

    public CompositeKey(IntWritable p, IntWritable s) {
        this.primaryField.set(p);
        this.secondaryField = s;
    }

    public void write(DataOutput out) throws IOException {
        this.primaryField.write(out);
        this.secondaryField.write(out);
    }

    public void readFields(DataInput in) throws IOException {
        this.primaryField.readFields(in);
        this.secondaryField.readFields(in);
    }

    // Called by the partitionner to group map outputs to same reducer instance
    // If the hash source is simple (primary type or so), a simple call to their hashCode() method is good enough
    public int hashCode() {
        return this.primaryField.hashCode();
    }

    @Override
    public int compareTo(CompositeKey other) {
        if (this.getPrimaryField().equals(other.getPrimaryField())) {
            return this.getSecondaryField().compareTo(other.getSecondaryField());
        } else {
            return this.getPrimaryField().compareTo(other.getPrimaryField());
        }
    }
}

public class CompositeGroupingComparator extends WritableComparator {
    public CompositeGroupingComparator() {
        super(CompositeKey.class, true);
    }

    @Override
    public int compare(WritableComparable a, WritableComparable b) {
        CompositeKey first = (CompositeKey) a;
        CompositeKey second = (CompositeKey) b;

        return first.getPrimaryField().compareTo(second.getPrimaryField());
    }
}

public class CompositeSortingComparator extends WritableComparator {
    public CompositeSortingComparator() {
        super (CompositeKey.class, true);
    }

    @Override
    public int compare (WritableComparable a, WritableComparable b){
        CompositeKey first = (CompositeKey) a;
        CompositeKey second = (CompositeKey) b;

        return first.compareTo(second);
    }
}

关于hadoop - mapreduce 中的默认排序是使用 WritableComparable 类中定义的 Comparator 还是 comapreTo() 方法？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29184049/

24

4

0

文章推荐： javascript - AngularJS 通过组合 javascript 文件减少 HTTP 请求

文章推荐： php - 从 Paypal IPN 接收错误 "Invalid Host header"

文章推荐： c++ - 可变参数模板将参数解包到类型名

ruby - 当我运行 "gem list"时， bundler (默认 : 1. 16.2)是什么意思？什么是(默认)部分？
我收到此错误消息: .rvm/gems/ruby-2.5.1/bin/ruby_executable_hooks:24:in `' 我重新安装了 Ruby rvm reinstall ruby-2.5
icons - 默认/通用功能区图标
我开始从事 WPF Ribbon 开发，非常好! 我的问题是找到(免费)基本图标(如文件保存/打开/等，剪切/粘贴/等)。你有什么建议吗？最佳答案你看过Visual Studio Icon Li
ios - 如何播放日历警报声和弦(默认)？
我只找到经典的声音ID，但我需要Chord(默认)日历警报。如何播放声音？ ks #define systemSoundID 1315 AudioServicesPlaySystemSound (s
Magento - 电子邮件发货 = 默认
在 Magento 中创建货件时，有一个复选框可让您“通过电子邮件发送货件副本”。默认情况下未选中。有谁知道我需要编辑哪个文件才能默认设置为“选中”？最佳答案这是一个app/design/adm
c# - 默认 IValueConverter
我有一个简单的 IValueConverter，它只使用 TypeConverter 进行转换。但是，在某些情况下，提供的 TypeConverter 会失败。如果转换器未提供 Binding，我想
android - 默认 Activity
我正在阅读教程，默认 Activity 是一个扩展另一个类的类，它所拥有的只是一个覆盖方法。应用程序如何工作，因为它不做任何其他事情？我很困惑! 最佳答案父类 Activity 为您处理一切。关于
java - 无法连接到数据库 [默认]
我刚刚开始研究游戏框架。我正在尝试构建 rest api，并将 postgresql 用于我的数据库连接。这是我第一次同时使用 play 和 postgre。我在 build.sbt 中建立了一个数据
Python 默认/未命名方法
是否可以创建具有以下属性的 python 对象: class Foo: def __default_method__(x): return x f = Foo() f(10) > 10
jQuery 默认/占位符输入文本和保存信息的问题
我是 jQuery 的新手，遇到了一个烦人的问题。我有一些登录字段，当该字段为空时会填充默认文本，然后在单击时删除。我的问题是，当用户保存了他们的用户名/密码(使用浏览器)时，如果他们返回页面，登录
c++ - (默认)为每个可变类型构造一个对象
考虑这个代码片段: void Foo(std::string str1, std::string str2) {} template void Bar() { Foo(Types{}...);
c - 默认 GCcflags
我正在编写一个简单的 C 程序，我应该用缓冲区溢出来攻击它。所以，我不想在编译时使用任何标志。如何消除使用的默认标志？ # readelf -p .GCC.command.line stack Str
c++ - (默认)为每个可变类型构造一个对象
考虑这个代码片段: void Foo(std::string str1, std::string str2) {} template void Bar() { Foo(Types{}...);
C++ 默认、复制和提升构造函数
我有以下代码[这是一道面试题]: #include #include using namespace std; class A{ public: A(){ cout co
c# - 每个匹配生命周期范围的实例，默认？
我想在 Autofac 中为每个匹配的生命周期范围注册创建一个实例，但偶尔需要从全局容器(没有匹配的生命周期范围)请求一个实例。在不存在匹配生命周期范围的情况下，我想给出一个顶级实例而不是抛出异常。
javascript - 默认 Javascript 对象很大时速度很慢？
我正在做一个收集单词共现的修改版本，所以我编写了自己的 javascript，我正在跟踪三个对象中的出现。但是，一旦对象变大(约 800 万、300 万和 172000)，每 100000 个句子需要
numpy - pykalman:(默认)处理缺失值
我正在使用 pykalman 模块中的 KalmanFilter，我想知道它如何处理缺失的观察结果。根据文档: In real world systems, it is common to have
wpf - 默认 RenderTransform 转换器语法
我有一个应用了 RenderTransform 的 Canvas ，如下所示: 谁能告诉我这些值是什么意思？我似乎无法找到用于解析这些值的转换器。最佳答案如 RenderTransform是 T
authentication - 默认 key 环密码
我是 Linux 的新手，现在使用 CentOS 6。我在这里使用 MySQL 工作台，每当我尝试添加新连接时，它都会询问我默认的 key 环密码。我真的不知道，这个密码是从哪里设置的，我之前没有设置
Linux - 默认 OpenGL 版本
我在 Ubuntu 18.04 上工作。我没有定义 GL_GLEXT_PROTOTYPES .我使用 glXGetProcAddress 加载“核心”OpenGL 函数.我的申请链接到 /usr/li
ubuntu - 默认 SSL 证书目录
我按照文档中的示例添加了对使用 asio 加载 HTTPS 站点的支持，这意味着我调用 ctx.set_default_verify_paths();使用系统默认路径来查找证书。然而，我得到:una

首页

博学

6Ren·AI

商城

hadoop - mapreduce 中的默认排序是使用 WritableComparable 类中定义的 Comparator 还是 comapreTo() 方法？