Hadoop MapReduce 遍历 reduce 调用的输入值-6ren

Hadoop MapReduce 遍历 reduce 调用的输入值

转载作者：可可西里更新时间：2023-11-01 15:03:08

我正在测试一个简单的 mapreduce 应用程序，但我在尝试理解当我迭代 reduce 调用的输入值时会发生什么时遇到了一些困难。

这是一段行为异常的代码..

public void reduce(Text key, Iterable<E> values, Context context)
    throws IOException, InterruptedException{

    Iterator<E> iterator = values.iterator();
    E first = (E)statesIter.next();

    while(statesIter.hasNext()){
        E state = statesIter.next();

        System.out.println(first.toString());
        // some other stuff
    }
    // some other stuff
}

所以没有什么奇怪的.. 除了每个 println 调用实际上打印不同的字符串。因此，每次我调用 next() 方法时，first 引用的对象都会发生变化。

那么为什么会出现这种奇怪的行为呢？

最佳答案

这有点违反直觉，但实际上是documented in the API docs -- Hadoop 重用键/值，如果您想保留它们，您应该克隆它们。

关于Hadoop MapReduce 遍历 reduce 调用的输入值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15976981/