hadoop - 为什么 hbase KeyValueSortReducer 需要对所有 KeyValue 进行排序-6ren

hadoop - 为什么 hbase KeyValueSortReducer 需要对所有 KeyValue 进行排序

转载作者：可可西里更新时间：2023-11-01 14:30:56

最近在学习Phoenix CSV Bulk Load，发现org.apache.phoenix.mapreduce.CsvToKeyValueReducer的源码在列比较大的时候会导致OOM(java heap out of memory)一行(在我的例子中，一行 44 列，一行的平均大小为 4KB)。

此外，该类与 hbase 批量负载 reducer 类 - KeyValueSortReducer 类似。这意味着在我的情况下使用 KeyValueSortReducer 时可能会发生 OOM。

所以，我有一个关于 KeyValueSortReducer 的问题 - 为什么它需要先对 treeset 中的所有 kvs 进行排序，然后将它们全部写入上下文？如果我删除树集排序代码并将所有 kvs 直接写入上下文，结果会有所不同或错误吗？

期待您的回复。祝你好运!

这是 KeyValueSortReducer 的源代码:

public class KeyValueSortReducer extends Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable, KeyValue> {
  protected void reduce(ImmutableBytesWritable row, java.lang.Iterable<KeyValue> kvs,
      org.apache.hadoop.mapreduce.Reducer<ImmutableBytesWritable, KeyValue, ImmutableBytesWritable, KeyValue>.Context context)
  throws java.io.IOException, InterruptedException {
    TreeSet<KeyValue> map = new TreeSet<KeyValue>(KeyValue.COMPARATOR);
    for (KeyValue kv: kvs) {
      try {
        map.add(kv.clone());
      } catch (CloneNotSupportedException e) {
        throw new java.io.IOException(e);
      }
    }
    context.setStatus("Read " + map.getClass());
    int index = 0;
    for (KeyValue kv: map) {
      context.write(row, kv);
      if (++index % 100 == 0) context.setStatus("Wrote " + index);
    }
  }
}

最佳答案

请查看this case study .在某些情况下，您需要将键值对排序到 HFile 的同一行中。

关于hadoop - 为什么 hbase KeyValueSortReducer 需要对所有 KeyValue 进行排序，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37047145/

文章推荐： maven - 在 Windows 10 中打开 Windows SDK 命令提示符

文章推荐： hadoop - 移到垃圾箱文件到哪里去了？

文章推荐： java - CellUtil : Key type in createCell method

hadoop - 为什么 hbase KeyValueSortReducer 需要对所有 KeyValue 进行排序
最近在学习Phoenix CSV Bulk Load，发现org.apache.phoenix.mapreduce.CsvToKeyValueReducer的源码在列比较大的时候会导致OOM(java

可可西里

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

hadoop - 为什么 hbase KeyValueSortReducer 需要对所有 KeyValue 进行排序