gpt4 book ai didi

hadoop - 在Hadoop中合并两个SortedMapWritable?

转载 作者:行者123 更新时间:2023-12-02 20:11:46 24 4
gpt4 key购买 nike

我定义了一个名为EquivalenceClsAggValue的类,该类具有数组的数据字段(称为aggValues)。

class public class EquivalenceClsAggValue extends Configured implements WritableComparable<EquivalenceClsAggValue>{

public ArrayList<SortedMapWritable> aggValues;

它具有一个方法,该方法采用另一个 EquivalenceClsAggValue类型的对象并将其 aggValues合并到此类的 aggValues中,如下所示:
public void addEquivalenceCls(EquivalenceClsAggValue eq){
//comment: eq contains only one entry as it comes from the mapper

if (this.aggValues.size()==0){ //new line
this.aggValues = eq.aggValues;
return;
}

for(int i=0;i<eq.aggValues.size();i++){

SortedMapWritable cm = aggValues.get(i); //cm: current map
SortedMapWritable nm = eq.aggValues.get(i); //nm: new map
Text nk = (Text) nm.firstKey();//nk: new key

if(cm.containsKey(nk)){//increment the value
IntWritable ovTmp = (IntWritable) cm.get(nk);
int ov = ovTmp.get();
cm.remove(nk);
cm.put(nk, new IntWritable(ov+1));
}
else{//add new entry
cm.put(nk, new IntWritable(1));
}


}
}

但是此功能未合并两个 aggValues。有人可以帮我解决吗?
这就是我所谓的方法:
public void reduce(IntWritable keyin,Iterator<EquivalenceClsAggValue> valuein,OutputCollector<IntWritable, EquivalenceClsAggValue> output,Reporter arg3) throws IOException {

EquivalenceClsAggValue comOutput = valuein.next();//initialize the output with the first input

while(valuein.hasNext()){
EquivalenceClsAggValue e = valuein.next();
comOutput.addEquivalenceCls(e);
}
output.collect(keyin, comOutput);
}

最佳答案

看起来您在重复使用对象上犯规。 Hadoop重用了同一对象,因此每次调用valuein.next()实际上都会返回相同的对象引用,但是该对象的内容是通过readFields方法重新初始化的。

尝试按以下方式进行更改(创建要聚合到的新实例):

 EquivalenceClsAggValue comOutput = new EquivalenceClsAggValue();

while(valuein.hasNext()){
EquivalenceClsAggValue e = valuein.next();
comOutput.addEquivalenceCls(e);
}
output.collect(keyin, comOutput);

编辑:您可能也需要更新聚合方法(以防对象重复使用):
public void addEquivalenceCls(EquivalenceClsAggValue eq){
//comment: eq contains only one entry as it comes from the mapper

for(int i=0;i<eq.aggValues.size();i++){

SortedMapWritable cm = aggValues.get(i); //cm: current map
SortedMapWritable nm = eq.aggValues.get(i); //nm: new map
Text nk = (Text) nm.firstKey();//nk: new key

if(cm.containsKey(nk)){//increment the value
// you don't need to remove and re-add, just update the IntWritable
IntWritable ovTmp = (IntWritable) cm.get(nk);
ovTmp.set(ovTmp.get() + 1);
}
else{//add new entry
// be sure to create a copy of nk when you add in to the map
cm.put(new Text(nk), new IntWritable(1));
}
}
}

关于hadoop - 在Hadoop中合并两个SortedMapWritable?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14286595/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com