gpt4 book ai didi

hadoop - Java Hadoop中的Map Reduce

转载 作者:行者123 更新时间:2023-12-02 21:51:35 24 4
gpt4 key购买 nike

我是Hadoop的新手。我有以下格式的文件:

123textfinder以后。这是一个固定宽度的文件。我想添加一个定界符。假设我的第一个字段是123,即长度3,第二个字段是textfinder,即:长度10,之后是第三字段,即长度5。每个字段都有一个预定义的长度。应该是123 | textfinder |以后。我只有值(文件中的行)。映射程序和化简程序的关键是什么。

提前致谢

最佳答案

您甚至在特定情况下甚至都不需要缩减器,映射器的键值仍然像往常一样是line no. - line,那么您只需要写回添加定界符的行作为键即可。检查以下代码:

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;



public class Delimiter extends Configured implements Tool {

public static class DelimiterMapper
extends Mapper<LongWritable, Text, Text, NullWritable> {

private static Text addDelimiter(Text value, char delimiter) {
String str = value.toString();
String ret = str.substring(0,2) + delimiter + str.substring(3,12) + delimiter + str.substring(13);
return new Text(ret);
}

public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
context.write(addDelimiter(value, '|'), NullWritable.get());
}

}

public int run(String[] args)
throws IOException, InterruptedException, ClassNotFoundException {
Job job = Job.getInstance(getConf());
if (args.length != 2) {
System.err.println("Usage: Delimiter <in> <out>");
return 2;
}

FileInputFormat.addInputPath(job, new Path(args[0]));
Path outputDir = new Path(args[1]);
if (outputDir.getFileSystem(getConf()).exists(outputDir)) {
throw new IOException("Output directory " + outputDir +
"already exists");
}
FileOutputFormat.setOutputPath(job, outputDir);
job.setJobName("Delimiter");
job.setJarByClass(Delimiter.class);
job.setMapperClass(DelimiterMapper.class);
job.setNumReduceTasks(0);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
return job.waitForCompletion(true) ? 0:1;

}

public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new Delimiter(), args);
System.exit(res);
}
}

关于hadoop - Java Hadoop中的Map Reduce,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20273770/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com