gpt4 book ai didi

hadoop - 我如何从一个 HBase 实例读取但写入另一个实例?

转载 作者:可可西里 更新时间:2023-11-01 14:14:06 25 4
gpt4 key购买 nike

目前我有两个 Hbase 表(我们称它们为 tableAtableB)。使用单阶段 MapReduce 作业,tableA 中的数据被读取处理并保存到 tableB。目前这两个表驻留在同一个 HBase 集群中。但是,我需要将 tableB 重新定位到它的集群上。

是否可以在 Hadoop 中配置单阶段 map reduce 作业以从不同的 HBase 实例读取和写入?

最佳答案

有可能,HBase的CopyTable MapReduce job通过使用 TableMapReduceUtil.initTableReducerJob() 来实现这允许您设置一个替代的 quorumAddress,以防您需要写入远程集群:

public static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl)

quorumAddress - Distant cluster to write to; default is null for output to the cluster that is designated in hbase-site.xml. Set this String to the zookeeper ensemble of an alternate remote cluster when you would have the reduce write a cluster that is other than the default; e.g. copying tables between clusters, the source would be designated by hbase-site.xml and this param would have the ensemble address of the remote cluster. The format to pass is particular. Pass :: such as server,server2,server3:2181:/hbase.


另一种选择是实现您自己的自定义化简器以写入远程表而不是写入上下文。类似这样的东西:

public static class MyReducer extends Reducer<Text, Result, Text, Text> {

protected Table remoteTable;
protected Connection connection;

@Override
protected void setup(Context context) throws IOException, InterruptedException {
super.setup(context);
// Clone configuration and provide a new quorum address for the remote cluster
Configuration config = HBaseConfiguration.create(context.getConfiguration());
config.set("hbase.zookeeper.quorum","quorum1,quorum2,quorum3");
connection = ConnectionFactory.createConnection(config); // HBase 0.99+
//connection = HConnectionManager.createConnection(config); // HBase <0.99
remoteTable = connection.getTable("myTable".getBytes());
remoteTable.setAutoFlush(false);
remoteTable.setWriteBufferSize(1024L*1024L*10L); // 10MB buffer
}

public void reduce(Text boardKey, Iterable<Result> results, Context context) throws IOException, InterruptedException {
/* Write puts to remoteTable */
}

@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
super.cleanup(context);
if (remoteTable!=null) {
remoteTable.flushCommits();
remoteTable.close();
}
if(connection!=null) {
connection.close();
}
}
}

关于hadoop - 我如何从一个 HBase 实例读取但写入另一个实例?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29547397/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com