gpt4 book ai didi

java - MapReduce 输出 ArrayWritable

转载 作者:可可西里 更新时间:2023-11-01 14:15:32 26 4
gpt4 key购买 nike

我正在尝试从一个简单的 MapReduce 任务中的 ArrayWritable 获取输出。我发现了几个有类似问题的问题,但我无法在自己的代码中解决问题。所以我期待着你的帮助。谢谢 :)!

输入带有一些句子的文本文件。

输出应该是:

<Word, <length, number of same words in Textfile>>
Example: Hello 5 2

我在工作中得到的输出是:

hello WordLength_V01$IntArrayWritable@221cf05
test WordLength_V01$IntArrayWritable@799e525a

我认为问题出在 IntArrayWritable 的子类中,但我没有得到正确的更正来解决这个问题。顺便说一句,我们有 Hadoop 2.5,我使用以下代码来获得此结果:

主要方法:

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word length V1");

// Set Classes
job.setJarByClass(WordLength_V01.class);
job.setMapperClass(MyMapper.class);
// job.setCombinerClass(MyReducer.class);
job.setReducerClass(MyReducer.class);

// Set Output and Input Parameters
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntArrayWritable.class);

// Number of Reducers
job.setNumReduceTasks(1);

// Set FileDestination
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);
}

映射器:

public static class MyMapper extends Mapper<Object, Text, Text, IntWritable> {

// Initialize Variables
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

// Map Method
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

// Use Tokenizer
StringTokenizer itr = new StringTokenizer(value.toString());

// Select each word
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());

// Output Pair
context.write(word, one);
}
}
}

reducer :

public static class MyReducer extends Reducer<Text, IntWritable, Text, IntArrayWritable> {

// Initialize Variables
private IntWritable count = new IntWritable();
private IntWritable length = new IntWritable();

// Reduce Method
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

// Count Words
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}

count.set(sum);

// Wordlength
length.set(key.getLength());

// Define Output
IntWritable[] temp = new IntWritable[2];
IntArrayWritable output = new IntArrayWritable(temp);

temp[0] = count;
temp[1] = length;

// Output
output.set(temp);
context.write(key, new IntArrayWritable(output.get()));
}
}

子类

public static class IntArrayWritable extends ArrayWritable {
public IntArrayWritable(IntWritable[] intWritables) {
super(IntWritable.class);
}

@Override
public IntWritable[] get() {
return (IntWritable[]) super.get();
}

@Override
public void write(DataOutput arg0) throws IOException {
for(IntWritable data : get()){
data.write(arg0);
}
}
}

我使用以下链接找到解决方案:

我真的很感谢任何想法!

-------- 解决方案--------

新子类:

public static class IntArrayWritable extends ArrayWritable {

public IntArrayWritable(IntWritable[] values) {
super(IntWritable.class, values);
}

@Override
public IntWritable[] get() {
return (IntWritable[]) super.get();
}

@Override
public String toString() {
IntWritable[] values = get();
return values[0].toString() + ", " + values[1].toString();
}
}

新的 Reduce 方法:

public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {

// Count Words
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}

count.set(sum);

// Wordlength
length.set(key.getLength());

// Define Output
IntWritable[] temp = new IntWritable[2];
temp[0] = count;
temp[1] = length;

context.write(key, new IntArrayWritable(temp));
}

最佳答案

一切看起来都很完美。只需要在您的 子类 中再编写一个方法 printStrings() ,它返回一个字符串而不是数组。内置的 toString() 将返回字符串数组,这就是它在输出中给出地址而不是值的原因。

public String printStrings() {
String strings = "";
for (int i = 0; i < values.length; i++) {
strings = strings + " "+ values[i].toString();
}
return strings;
}

关于java - MapReduce 输出 ArrayWritable,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28914596/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com