gpt4 book ai didi

Hadoop - 多输入

转载 作者:可可西里 更新时间:2023-11-01 15:35:28 26 4
gpt4 key购买 nike

我正在尝试使用 Hadoop 中的 MultipleInputs。我所有的映射器都是 FixedLengthInputFormat。

MultipleInputs.addInputPath(job, 
new Path(rootDir),
FixedLengthInputFormat.class,
OneToManyMapper.class);

问题是每个映射器都有不同大小的固定记录宽度。

config.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, ??);

有没有使用 MultipleInputs 为每个映射器传递 FIXED_RECORD_LENGTH?

谢谢!

最佳答案

解决方法如下:

public class CustomFixedLengthInputFormat extends FixedLengthInputFormat{

@Override
public RecordReader<LongWritable, BytesWritable> createRecordReader(
InputSplit split, TaskAttemptContext context) throws IOException,
InterruptedException {
//here i can control de recordLength size!
int recordLength = ??;// getRecordLength(context.getConfiguration());
if (recordLength <= 0) {
throw new IOException(
"Fixed record length "
+ recordLength
+ " is invalid. It should be set to a value greater than zero");
}

System.out.println("Record Length: " + recordLength);

return new FixedLengthRecordReader(recordLength);
}

}

关于Hadoop - 多输入,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26341913/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com