gpt4 book ai didi

java - Hadoop map reduce 总是写入相同的值

转载 作者:可可西里 更新时间:2023-11-01 14:25:47 26 4
gpt4 key购买 nike

我正在尝试运行一个简单的 map reduce 程序,其中 mapper 为同一个键写入两个不同的值,但当我到达 reducer 时,它们最终总是相同的。

这是我的代码:

public class kaka {

public static class Mapper4 extends Mapper<Text, Text, Text, Text>{
public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
context.write(new Text("a"),new Text("b"));
context.write(new Text("a"),new Text("c"));
}
}

public static class Reducer4 extends Reducer<Text,Text,Text,Text> {
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
Vector<Text> vals = new Vector<Text>();
for (Text val : values){
vals.add(val);
}

return;
}
}
public static void main(String[] args) throws Exception {
//deleteDir(new File("eran"));//todo
Configuration conf = new Configuration();
conf.set("mapred.map.tasks","10"); // asking for more mappers (it's a recommendation)
conf.set("mapred.max.split.size","1000000"); // set default size of input split. 1000 means 1000 bytes.

Job job1 = new Job(conf, "find most similar words");
job1.setJarByClass(kaka.class);
job1.setInputFormatClass(SequenceFileInputFormat.class);
job1.setMapperClass(Mapper4.class);
job1.setReducerClass(Reducer4.class);
job1.setOutputFormatClass(SequenceFileOutputFormat.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job1, new Path("vectors/part-r-00000"));
FileOutputFormat.setOutputPath(job1, new Path("result"));
job1.waitForCompletion(true);
System.exit(job1.waitForCompletion(true) ? 0 : 1);
}

}

最佳答案

您在迭代 reducer 中的值时被 objext 重用所困扰。很久以前有一个 JIRA 补丁来提高效率,这意味着传递给你的 mapper 的 Key/Value 对象和传递给你的 reducer 的 Key/Value 对象总是相同的底层对象引用,只是那些对象的内容每次迭代都会改变。

修改您的代码以在添加到 vector 之前制作值的副本:

public static class Reducer4 extends Reducer<Text,Text,Text,Text> {
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
Vector<Text> vals = new Vector<Text>();
for (Text val : values){
// make copy of val before adding to the Vector
vals.add(new Text(val));
}

return;
}
}

关于java - Hadoop map reduce 总是写入相同的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10978068/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com