gpt4 book ai didi

hadoop - 如何选择部分行并在 HBase 中创建新表?

转载 作者:可可西里 更新时间:2023-11-01 16:30:06 25 4
gpt4 key购买 nike

我在HBase中有一个大表,我想把它们分成几个小表,这样我使用起来会更方便。 (应保留原始表格。)我该怎么做?

例如:我有一个名为 all 的表,其行键如下:

animal-1, ...
plant-1, ...
animal-2, ...
plant-2, ...
human-1, ...
human-2, ...

我想把它分成三张表:animal,plant,human三种生物。我该怎么做?

最佳答案

您可以将 Mapreduce 与 MultipleTableOutputFormat 结合使用像下面的例子。

但在下面的示例中,我正在从文件中读取,即 TextInputFormat 而不是您必须使用 TableInputFormat 'all' 从 Hbase 表中读取它> 而不是 table1 table2 ... 你必须使用 'animal', 'planet', 'human'

根据您的要求,如果您对 Hbase 表进行扫描并使用表 InputFormat 将其传递给 Mapper,您将获得 rowkey 以及 Mapper 的 map 方法。你需要比较这个来决定你要插入哪个表。

Please see 7.2.2. HBase MapReduce Read/Write Example

package mapred;
import java.io.IOException;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.MultiTableOutputFormat;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.hbase.client.Put;
public class MultiTableMapper {
static class InnerMapper extends Mapper <LongWritable, Text, ImmutableBytesWritable, Put> {
public void map(LongWritable offset, Text value, Context context) throws IOException {
// contains the line of tab separated data we are working on (needs to be parsed out).
//byte[] lineBytes = value.getBytes();
String valuestring[]=value.toString().split(“\t”);
String rowid = /*HBaseManager.generateID();*/ “12345”;
// rowKey is the hbase rowKey generated from lineBytes
Put put = new Put(rowid.getBytes());
put.add(Bytes.toBytes(“UserInfo”), Bytes.toBytes(“StudentName”), Bytes.toBytes(valuestring[0]));
try {
context.write(new ImmutableBytesWritable(Bytes.toBytes(“Table1”)), put);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} // write to the actions table
// rowKey2 is the hbase rowKey
Put put1 = new Put(rowid.getBytes());
put1.add(Bytes.toBytes(“MarksInfo”),Bytes.toBytes(“Marks”),Bytes.toBytes(valuestring[1]));
// Create your KeyValue object
//put.add(kv);
try {
context.write(new ImmutableBytesWritable(Bytes.toBytes(“Table2”)), put1);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} // write to the actions table
}
}
public static void createSubmittableJob() throws IOException, ClassNotFoundException, InterruptedException {
Path inputDir = new Path(“in”);
Configuration conf = /*HBaseManager.getHBConnection();*/ new Configuration();
Job job = new Job(conf, “my_custom_job”);
job.setJarByClass(InnerMapper.class);
FileInputFormat.setInputPaths(job, inputDir);
job.setMapperClass(InnerMapper.class);
job.setInputFormatClass(TextInputFormat.class);
// this is the key to writing to multiple tables in hbase
job.setOutputFormatClass(MultiTableOutputFormat.class);
//job.setNumReduceTasks(0);
//TableMapReduceUtil.addDependencyJars(job);
//TableMapReduceUtil.addDependencyJars(job.getConfiguration());
System.out.println(job.waitForCompletion(true));
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
// TODO Auto-generated method stub
MultiTableMapper.createSubmittableJob();
System.out.println();
}
}

关于hadoop - 如何选择部分行并在 HBase 中创建新表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37762549/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com