gpt4 book ai didi

java - Hadoop 自定义记录读取器实现

转载 作者:可可西里 更新时间:2023-11-01 16:52:12 27 4
gpt4 key购买 nike

我发现很难理解以下链接中解释的 nextKeyValue() 方法中发生的事情的流程:

http://analyticspro.org/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/

尤其是 nextKeyValue() 中的 for 循环

任何帮助将不胜感激

提前致谢

最佳答案

nextKeyValue() 是为特定 map 调用设置键值对的核心函数。因此,从您的链接中,下面的代码(在 for 循环之前)只是将键设置为 pos,它只是起始偏移量 key.set(pos) 并且它缓冲了先前设置的值。对应代码:

public boolean nextKeyValue() throws IOException, InterruptedException {
if (key == null) {
key = new LongWritable();
}
key.set(pos);
if (value == null) {
value = new Text();
}
value.clear();
final Text endline = new Text("\n");
int newSize = 0;

for 循环之后。我已经为每一行添加了足够的注释。

       for(int i=0;i<NLINESTOPROCESS;i++){ //Since this is NLineInputFormat they want to read 3 lines at a time and set that as value,
so this loop will continue until that is satisfied.
Text v = new Text();
while (pos < end) { //This is to prevent the recordreader from reading the second split, if it is currently reading the first split. pos would be start
of the split and end would be end offset of the split.
newSize = in.readLine(v, maxLineLength,Math.max((int)Math.min(Integer.MAX_VALUE, end-pos),maxLineLength));
//This calls the linereader readline function which reads until it encounters a newline(default delim for TextInputformat and maxlinelength would be max integer size
just to ensure the whole line doesn''t go beyond the maxlinelength and the line read would be stored in Text variable v)
value.append(v.getBytes(),0, v.getLength());
//Reads from v(whole line) and appends it to the value,append is necessary because we are going to read 3 lines.
value.append(endline.getBytes(),0, endline.getLength());
//appends newline to each line read
if (newSize == 0) {
break;//If there is nothing to read then come out.
}
pos += newSize;
if (newSize < maxLineLength) {//There is a flaw here it should be >=, to imply if the read line is greater than max integer size then come out
break;
}
}
}
if (newSize == 0) {
key = null;//If there is nothing to read assign key and value as null else continue the process by returning true to map call.
value = null;
return false;
} else {
return true;
}
}
}

关于java - Hadoop 自定义记录读取器实现,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32110043/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com