gpt4 book ai didi

Hadoop PathFilter 配置为空

转载 作者:可可西里 更新时间:2023-11-01 14:58:56 27 4
gpt4 key购买 nike

我有一个看起来像这样的路径过滤器:

public class AvroFileInclusionFilter extends Configured implements PathFilter {
Configuration conf;

@Override
public void setConf(Configuration conf) {
this.conf = conf;
}

@Override
public boolean accept(Path path) {

System.out.println("FileInclusion: " + conf.get("fileInclusion"));

return true;
}
}

我在配置中明确设置了 fileInclusion 属性。出于某种原因,路径过滤器中使用的配置与我在工作中设置的配置不同,如下所示:

    Job job = Job.getInstance(getConf(), "Stock Updater");

job.getConfiguration().set("outputPath", opts.outputPath);

String[] inputPaths = findPathsForDays(job.getConfiguration(),
new Path(opts.inputPath), findDaysToQuery(job.getConfiguration(),
opts.updatefile)).toArray(new String[]{});
job.getConfiguration().set("fileInclusion", "hello`");

AvroKeyValueInputFormat.addInputPath(job, new Path(opts.inputPath));
job.getConfiguration().set("mapred.input.pathFilter.class", AvroFileInclusionFilter.class.getName());

job.setInputFormatClass(AvroKeyValueInputFormat.class);

LazyOutputFormat.setOutputFormatClass(job, AvroKeyValueOutputFormat.class);
AvroKeyValueOutputFormat.setOutputPath(job, new Path(opts.outputPath));

job.addCacheFile(new Path(opts.updatefile).toUri());

AvroKeyValueOutputFormat.setCompressOutput(job, true);
job.getConfiguration().set(AvroJob.CONF_OUTPUT_CODEC, snappyCodec().toString());

AvroJob.setInputKeySchema(job, DateKey.SCHEMA$);
AvroJob.setInputValueSchema(job, StockUpdated.SCHEMA$);
AvroJob.setMapOutputKeySchema(job, DateKey.SCHEMA$);
AvroJob.setMapOutputValueSchema(job, StockUpdated.SCHEMA$);
AvroJob.setOutputKeySchema(job, DateKey.SCHEMA$);
AvroJob.setOutputValueSchema(job, StockUpdated.SCHEMA$);

job.setMapperClass(StockUpdaterMapper.class);
job.setReducerClass(StockUpdaterReducer.class);

AvroMultipleOutputs.addNamedOutput(job, "output", AvroKeyValueOutputFormat.class,
DateKey.SCHEMA$, StockUpdated.SCHEMA$);

job.setJarByClass(getClass());

boolean success = job.waitForCompletion(true);

conf.get("fileInclusion") 始终为空,我似乎无法弄清楚原因。我已经为此工作了很长一段时间,而且我几乎已经筋疲力尽了。为什么配置不一样?我正在使用“hadoop jar”和“yarn jar”提交作业。

最佳答案

不是通过将 getConf() 方法作为参数来创建对象作业,而是尝试以下操作

Configuration conf = new Configuration();
conf.set("outputPath", opts.outputPath);
conf.set("mapred.input.pathFilter.class", AvroFileInclusionFilter.class.getName());
..
..
// After setting up the required key values in Configuration object Create Job object by supplying conf
Job job = new Job(conf, "Stock Updater");

关于Hadoop PathFilter 配置为空,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22928420/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com