gpt4 book ai didi

hadoop - 为什么在每个 reduce 方法之后都会重置 TreeMap?

转载 作者:可可西里 更新时间:2023-11-01 15:06:37 28 4
gpt4 key购买 nike

在我的 reduce 方法中,我想使用 TreeMap 变量 reduceMap 来聚合传入的键值。但是,此映射会在每次 reduce 方法调用时丢失其状态。随后 Hadoop 仅打印放入 TreeMap 的最后一个值(加上我添加的测试值)。这是为什么?它确实按照我在 map 方法中的预期工作。

public static class TopReducer extends
Reducer<Text, IntWritable, Text, IntWritable> {

private TreeMap<Text, Integer> reducedMap = new TreeMap<Text, Integer>();

@Override
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {

int sum = 0;
String strValues = "";
for (IntWritable value : values) {
sum += value.get();
strValues += value.get() + ", ";
}
System.out.println("Map size Before: " +reducedMap);
Integer val = sum;
if (reducedMap.containsKey(key))
val += reducedMap.get(key);
// Only add, if value is of top 30.
reducedMap.put(key, val);
System.out.println("Map size After: " +reducedMap);
reducedMap.put(new Text("test"), 77777);

System.out.println("REDUCER: rcv: (" + key + "), " + "(" + sum
+ "), (" + strValues + "):: new (" + val + ")");
}

/**
* Flush top 30 context to the next phase.
*/
@Override
protected void cleanup(Context context) throws IOException,
InterruptedException {
System.out.println("-----FLUSHING TOP " + TOP_N
+ " MAPPING RESULTS-------");
System.out.println("MapSize: " + reducedMap);
int i = 0;
for (Entry<Text, Integer> entry : entriesSortedByValues(reducedMap)) {
System.out.println("key " + entry.getKey() + ", value "
+ entry.getValue());
context.write(entry.getKey(), new IntWritable(entry.getValue()));

if (i >= TOP_N)
break;
else
i++;
}
}
}

最佳答案

Hadoop 出于效率目的重新使用对象引用 - 因此当您调用 reducedMap.put(key, val) 时,键值将匹配映射中已有的键(因为 Hadoop 刚刚替换了关键对象的内容,而不是为您提供对具有新内容的新对象的新引用)。它实际上与调用以下命令相同:

Text key = new Text("x");
reducedMap.put(key, val); // map will be of size 1
key.set("y");
reducedMap.put(key, val); // map will still be of size 1
// as it will be comparing key to the itself
// and just updating the mapped value val

在将 key 放入 map 之前,您需要对其进行深度复制:

reducedMap.put(new Text(key), val)

关于hadoop - 为什么在每个 reduce 方法之后都会重置 TreeMap?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20255982/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com