java - 尝试进行 hbase 批量加载作业时，reducer 使用布隆过滤器提示无序输入-6ren

java - 尝试进行 hbase 批量加载作业时，reducer 使用布隆过滤器提示无序输入

转载作者：可可西里更新时间：2023-11-01 14:53:54

我正在使用我这样设置的 map-reduce 作业进行大规模 hbase 导入。

job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);
job.setMapperClass(BulkMapper.class);

job.setOutputFormatClass(HFileOutputFormat.class);

FileInputFormat.setInputPaths(job, new Path(inputPath));
FileOutputFormat.setOutputPath(job, new Path(outputPath));
HFileOutputFormat.configureIncrementalLoad(job, hTable);  //This creates a text file that will be full of put statements, should take 10 minutes or so
boolean suc = job.waitForCompletion(true);

它使用我自己制作的映射器，并且 HFileOutputFormat.configureIncrementalLoad 设置了一个缩减器。我以前用这个设置做过概念证明，但是当我在一个大数据集上运行它时，它在 reducer 中死了，并出现这个错误:

Error: java.io.IOException: Non-increasing Bloom keys: BLMX2014-02-03nullAdded after BLMX2014-02-03nullRemoved at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.appendGeneralBloomfilter(StoreFile.java:934) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:970) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.write(HFileOutputFormat.java:168) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.write(HFileOutputFormat.java:124) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:576) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce(PutSortReducer.java:78) at org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce(PutSortReducer.java:43) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:645) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:405) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) Container killed by the ApplicationMaster. Container killed on request. Exit code is 143

我认为 hadoop 应该保证对 reducer 的排序输入，如果是这样，为什么我会遇到这个问题，我能做些什么来避免它？

最佳答案

我对这行得通感到非常恼火，问题出在我输入 map 输出的方式上。我用这个替换了我以前的输出:

ImmutableBytesWritable HKey = new ImmutableBytesWritable(put.getRow());
context.write(HKey, put);

基本上我使用的键和 put 语句的键略有不同，这导致 reducer 接收到的 put 语句乱序。

关于java - 尝试进行 hbase 批量加载作业时，reducer 使用布隆过滤器提示无序输入，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25249182/

文章推荐： c++ - 策略模式与继承的区别

文章推荐： css - 3 个 span 排成一排，每个 buy 宽度不同

文章推荐： html - chrome 显示 "page is in malay"但我的网页是用英文写的

文章推荐： javascript - 通过 node.js 和 hdfs 模块将文件上传到 HDFS

可可西里

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 尝试进行 hbase 批量加载作业时，reducer 使用布隆过滤器提示无序输入