gpt4 book ai didi

java - 在 Hadoop 中,如果你想将每个键值对的值保存到一个数组中,为什么你添加的所有元素都是相同的?

转载 作者:可可西里 更新时间:2023-11-01 15:19:03 27 4
gpt4 key购买 nike

我正在尝试存储 Map 函数获取的键值对中的值并进一步使用它们。给定以下输入:

Hello hadoop goodbye hadoop
Hello world goodbye world
Hello thinker goodbye thinker

如下代码:

注意 - map 是简单的 WordCount 示例

public class Inception extends Configured implements Tool{

public Path workingPath;

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

// initialising the arrays that contain the values and the keys
public ArrayList<LongWritable> keyBuff = new ArrayList<LongWritable>();
public ArrayList<Text> valueBuff = new ArrayList<Text>();


public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);

while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
System.out.println(word + " / " + one);
}
}

public void innerMap(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

// adding the value to the bufferr
valueBuff.add(value);
System.out.println("ArrayList addValue -> " + value);
for (Text v : valueBuff){
System.out.println("ArrayList containedValue -> " + value);
}

keyBuff.add(key);

}

public void run(Context context) throws IOException, InterruptedException {
setup(context);

// going over the key-value pairs and storing them into the arrays
while(context.nextKeyValue()){
innerMap(context.getCurrentKey(), context.getCurrentValue(), context);
}


Iterator itrv = valueBuff.iterator();
Iterator itrk = keyBuff.iterator();
while(itrv.hasNext()){
LongWritable nextk = (LongWritable) itrk.next();
Text nextv = (Text) itrv.next();
System.out.println("Value iterator -> " + nextv);
System.out.println("Key iterator -> " + nextk);

// iterating over the values and running the map on them.

map(nextk, nextv, context);
}

cleanup(context);
}
}

public int run(String[] args) throws Exception { ... }

public static void main (..) { ... }

好的,现在日志输出:

标准输出日志

ArrayList addValue -> Hello hadoop goodbye hadoop
ArrayList containedValue -> Hello hadoop goodbye hadoop
ArrayList addValue -> Hello world goodbye world
ArrayList containedValue -> Hello world goodbye world
ArrayList containedValue -> Hello world goodbye world
ArrayList addValue -> Hello thinker goodbye thinker
ArrayList containedValue -> Hello thinker goodbye thinker
ArrayList containedValue -> Hello thinker goodbye thinker
ArrayList containedValue -> Hello thinker goodbye thinker
Value iterator -> Hello thinker goodbye thinker
Key iterator -> 84
Hello / 1
thinker / 1
goodbye / 1
thinker / 1
Value iterator -> Hello thinker goodbye thinker
Key iterator -> 84
Hello / 1
thinker / 1
goodbye / 1
thinker / 1
Value iterator -> Hello thinker goodbye thinker
Key iterator -> 84
Hello / 1
thinker / 1
goodbye / 1
thinker / 1

所以您会注意到,每次我向 ArrayList valueBuff 添加一个新值时,列表中的所有值都会被覆盖。有谁知道为什么会这样,为什么数组中的值没有正确添加?

最佳答案

TextInputFormat使用 LineRecordReader .当 Context#nextKeyValue 被调用时,LineRecordReader#nextKeyValue 被调用。

在 LineRecordReader 中,每次调用 nextKeyValue 方法时都使用相同的键和值对象,只是更改了它们的内容。如果要保留键和值数据,则必须在用户代码中创建对象的副本。

这对于优化是有意义的,如果为每条记录创建一个新的键和值对象,那么系统很容易 OOM。

关于java - 在 Hadoop 中,如果你想将每个键值对的值保存到一个数组中,为什么你添加的所有元素都是相同的?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8668592/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com