gpt4 book ai didi

hadoop - 为 Hadoop MR 创建序列文件格式

转载 作者:可可西里 更新时间:2023-11-01 16:22:27 25 4
gpt4 key购买 nike

我在使用 Hadoop MapRedue 时遇到了一个问题。目前,我的映射器的输入KV类型LongWritable, LongWritable type输出KV类型也是LongWritable, LongWritable类型。InputFileFormat 是 SequenceFileInputFormat。基本上我想做的是将一个 txt 文件更改为 SequenceFileFormat,以便我可以将其用于我的映射器。

我想做的是

输入文件是这样的

1\t2(键 = 1,值 = 2)

2\t3(键 = 2,值 = 3)

不断...

我看了这个帖子 How to convert .txt file to Hadoop's sequence file format但请注意 TextInputFormat 仅支持 Key = LongWritable 和 Value = Text

有什么方法可以在KV = LongWritable, LongWritable中获取txt并制作序列文件?

最佳答案

当然,基本上与我在您链接的其他线程中所说的相同。但是您必须实现自己的 Mapper

只是为您快速划过:

public class LongLongMapper extends
Mapper<LongWritable, Text, LongWritable, LongWritable> {

@Override
protected void map(LongWritable key, Text value,
Mapper<LongWritable, Text, LongWritable, LongWritable>.Context context)
throws IOException, InterruptedException {

// assuming that your line contains key and value separated by \t
String[] split = value.toString().split("\t");

context.write(new LongWritable(Long.valueOf(split[0])), new LongWritable(
Long.valueOf(split[1])));

}

public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {

Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJobName("Convert Text");
job.setJarByClass(LongLongMapper.class);

job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);

// increase if you need sorting or a special number of files
job.setNumReduceTasks(0);

job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(LongWritable.class);

job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setInputFormatClass(TextInputFormat.class);

FileInputFormat.addInputPath(job, new Path("/input"));
FileOutputFormat.setOutputPath(job, new Path("/output"));

// submit and wait for completion
job.waitForCompletion(true);
}
}

映射器函数中的每个值都将获得一行输入,因此我们只是用定界符(制表符)将其拆分并将其每一部分解析为长整数。

就是这样。

关于hadoop - 为 Hadoop MR 创建序列文件格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12242979/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com