gpt4 book ai didi

hadoop - 风筝数据集 map-reduce

转载 作者:可可西里 更新时间:2023-11-01 14:57:47 25 4
gpt4 key购买 nike

我正在尝试使用 kite-dataset api 进行 map-reduce。

我已按照以下网址进行引用。

https://community.cloudera.com/t5/Kite-SDK-includes-Morphlines/Map-Reduce-with-Kite/td-p/22165

https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-mapreduce/src/test/java/org/kitesdk/data/mapreduce/TestMapReduce.java

我的代码片段如下

public class MapReduce {

private static final String sourceDatasetURI = "dataset:hive:test_avro";

private static final String destinationDatasetURI = "dataset:hive:test_parquet";

private static class LineCountMapper
extends Mapper<GenericData.Record, Void, Text, IntWritable> {
@Override
protected void map(GenericData.Record record, Void value,
Context context)
throws IOException, InterruptedException {
System.out.println("Record is "+record);
context.write(new Text(record.get("index").toString()), new IntWritable(1));
}
}

private Job createJob() throws Exception {
System.out.println("Inside Create Job");
Job job = new Job();
job.setJarByClass(getClass());

Dataset<GenericData.Record> inputDataset = Datasets.load(sourceDatasetURI, GenericData.Record.class);
Dataset<GenericData.Record> outputDataset = Datasets.load(destinationDatasetURI, GenericData.Record.class);

DatasetKeyInputFormat.configure(job).readFrom(inputDataset).withType(GenericData.Record.class);

job.setMapperClass(LineCountMapper.class);
DatasetKeyOutputFormat.configure(job).writeTo(outputDataset).withType(GenericData.Record.class);

job.waitForCompletion(true);

return job;
}

public static void main(String[] args) throws Exception {
MapReduce httAvroToParquet = new MapReduce();
httAvroToParquet.createJob();
}
}

我正在使用 HDP 2.3.2 box,创建程序集 jar 并通过 spark-submit 提交我的申请。

我在提交申请时遇到以下错误。

2015-12-18 04:09:07,156 WARN [main] org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
2015-12-18 04:09:07,282 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config null
2015-12-18 04:09:07,333 WARN [main] org.kitesdk.data.spi.Registration: Not loading URI patterns in org.kitesdk.data.spi.hive.Loader
2015-12-18 04:09:07,334 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive://{}:9083/default/test_parquet. Check that JARs for hive datasets are on the classpath.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive://{}:9083/default/test_parquet. Check that JARs for hive datasets are on the classpath.
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:478)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:458)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1560)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:458)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:377)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1518)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
Caused by: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive://{}:9083/default/test_parquet. Check that JARs for hive datasets are on the classpath.
at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
at org.kitesdk.data.Datasets.load(Datasets.java:103)
at org.kitesdk.data.Datasets.load(Datasets.java:165)
at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.load(DatasetKeyOutputFormat.java:510)
at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getOutputCommitter(DatasetKeyOutputFormat.java:473)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:476)
... 11 more

我没听明白是怎么回事?有没有类路径问题?如果是,那么我应该在哪里设置它?

最佳答案

你实际上有一个类路径问题

您的项目缺少 org.kitesdk:kite-data-hive

你可以

  • 在提交到 Spark 之前将这个 jar 添加到你的 fat jar
  • 告诉 Spark 在您提交时将其添加到您的类路径中

关于hadoop - 风筝数据集 map-reduce,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34353013/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com