gpt4 book ai didi

java - 通过 Intellij IDEA 运行 Hadoop MR 作业

转载 作者:行者123 更新时间:2023-12-02 21:46:19 24 4
gpt4 key购买 nike

我有一个仅映射作业配置为在分布式模式下运行。当我运行它抛出 CLI 时,Job 成功运行。启动字符串如下所示:
hadoop jar FileHandy.jar com.company.MainRun arg1 arg2
但是如果我通过 IDE (Intellij IDEA) 运行它,它会失败并出现错误(找不到 Mapper 类):

14/07/30 01:07:34 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/07/30 01:07:34 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/07/30 01:07:35 INFO input.FileInputFormat: Total input paths to process : 1
14/07/30 01:07:36 INFO mapred.JobClient: Running job: job_201407300013_0001
14/07/30 01:07:37 INFO mapred.JobClient: map 0% reduce 0%
14/07/30 01:07:55 INFO mapred.JobClient: Task Id : attempt_201407300013_0001_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.expedia.eww.FileMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1617)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:191)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassNotFoundException: Class com.expedia.eww.FileMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1523)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1615)
... 8 more

我已经设置了 IDE 并仅使用带有依赖项的 maven pom.xml(我使用 IDEA 的 Build 进程生成的 jar 文件而不是 maven jar,但如果使用 maven jar 文件 - 结果相同)。我的 IDE 运行配置如下:
Main class: org.apache.hadoop.util.RunJar
Programs args: /path/to/jar/FileHandy.jar com.company.FileRun arg1 arg2
Work dir set

代码片段:
Job job = new Job(conf, "File2Hdfs");
job.setJarByClass(FileRun.class);
job.setMapperClass(FileMapper.class);
job.setInputFormatClass(NLineInputFormat.class);
job.setNumReduceTasks(0);
//FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost/user/cloudera/out111"));
FileOutputFormat.setOutputPath(job, new Path(arg0[1]));
FileInputFormat.addInputPath(job, new Path(fileForMapper));


return job.waitForCompletion(true) ? 0 : 1;

FileRun.class (with main) 和 FileMapper.class (mapper) 在 com.company 包中。

运行项目时,IDEA 启动如下:
/usr/java/jdk1.6.0_32/bin/java -Didea.launcher.port=7547 -Didea.launcher.bin.path=/home/cloudera/Downloads/idea-IC-135.909/bin -Dfile.encoding=UTF-8 -classpath /usr/java/jdk1.6.0_32/jre/lib/rt.jar:/usr/java/jdk1.6.0_32/jre/lib/deploy.jar:/usr/java/jdk1.6.0_32/jre/lib/resources.jar:/usr/java/jdk1.6.0_32/jre/lib/jsse.jar:/usr/java/jdk1.6.0_32/jre/lib/management-agent.jar:/usr/java/jdk1.6.0_32/jre/lib/jce.jar:/usr/java/jdk1.6.0_32/jre/lib/plugin.jar:/usr/java/jdk1.6.0_32/jre/lib/charsets.jar:/usr/java/jdk1.6.0_32/jre/lib/javaws.jar:/usr/java/jdk1.6.0_32/jre/lib/ext/sunpkcs11.jar:/usr/java/jdk1.6.0_32/jre/lib/ext/dnsns.jar:/usr/java/jdk1.6.0_32/jre/lib/ext/localedata.jar:/usr/java/jdk1.6.0_32/jre/lib/ext/sunjce_provider.jar:/home/cloudera/IdeaProjects/MavenFileHandy/target/classes:/home/cloudera/.m2/repository/org/apache/hadoop/hadoop-client/2.0.0-mr1-cdh4.4.0/hadoop-client-2.0.0-mr1-cdh4.4.0.jar:/home/cloudera/.m2/repository/org/apache/hadoop/hadoop-common/2.0.0-cdh4.4.0/hadoop-common-2.0.0-cdh4.4.0.jar:/home/cloudera/.m2/repository/org/apache/hadoop/hadoop-annotations/2.0.0-cdh4.4.0/hadoop-annotations-2.0.0-cdh4.4.0.jar:/usr/java/jdk1.6.0_32/lib/tools.jar:/home/cloudera/.m2/repository/com/google/guava/guava/11.0.2/guava-11.0.2.jar:/home/cloudera/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/home/cloudera/.m2/repository/org/apache/commons/commons-math/2.1/commons-math-2.1.jar:/home/cloudera/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/home/cloudera/.m2/repository/commons-codec/commons-codec/1.4/commons-codec-1.4.jar:/home/cloudera/.m2/repository/commons-io/commons-io/2.1/commons-io-2.1.jar:/home/cloudera/.m2/repository/commons-net/commons-net/3.1/commons-net-3.1.jar:/home/cloudera/.m2/repository/commons-el/commons-el/1.0/commons-el-1.0.jar:/home/cloudera/.m2/repository/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar:/home/cloudera/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/cloudera/.m2/repository/junit/junit/4.8.2/junit-4.8.2.jar:/home/cloudera/.m2/repository/commons-lang/commons-lang/2.5/commons-lang-2.5.jar:/home/cloudera/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/home/cloudera/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/home/cloudera/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/home/cloudera/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/home/cloudera/.m2/repository/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/home/cloudera/.m2/repository/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.jar:/home/cloudera/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar:/home/cloudera/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.8.8/jackson-core-asl-1.8.8.jar:/home/cloudera/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.8.8/jackson-mapper-asl-1.8.8.jar:/home/cloudera/.m2/repository/org/mockito/mockito-all/1.8.5/mockito-all-1.8.5.jar:/home/cloudera/.m2/repository/org/apache/avro/avro/1.7.4/avro-1.7.4.jar:/home/cloudera/.m2/repository/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar:/home/cloudera/.m2/repository/org/xerial/snappy/snappy-java/1.0.4.1/snappy-java-1.0.4.1.jar:/home/cloudera/.m2/repository/org/apache/commons/commons-compress/1.4.1/commons-compress-1.4.1.jar:/home/cloudera/.m2/repository/org/tukaani/xz/1.0/xz-1.0.jar:/home/cloudera/.m2/repository/com/google/protobuf/protobuf-java/2.4.0a/protobuf-java-2.4.0a.jar:/home/cloudera/.m2/repository/org/apache/hadoop/hadoop-auth/2.0.0-cdh4.4.0/hadoop-auth-2.0.0-cdh4.4.0.jar:/home/cloudera/.m2/repository/com/jcraft/jsch/0.1.42/jsch-0.1.42.jar:/home/cloudera/.m2/repository/org/apache/zookeeper/zookeeper/3.4.5-cdh4.4.0/zookeeper-3.4.5-cdh4.4.0.jar:/home/cloudera/.m2/repository/jline/jline/0.9.94/jline-0.9.94.jar:/home/cloudera/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.0.0-cdh4.4.0/hadoop-hdfs-2.0.0-cdh4.4.0.jar:/home/cloudera/.m2/repository/com/sun/jersey/jersey-core/1.8/jersey-core-1.8.jar:/home/cloudera/.m2/repository/com/sun/jersey/jersey-server/1.8/jersey-server-1.8.jar:/home/cloudera/.m2/repository/asm/asm/3.1/asm-3.1.jar:/home/cloudera/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/home/cloudera/.m2/repository/org/apache/hadoop/hadoop-core/2.0.0-mr1-cdh4.4.0/hadoop-core-2.0.0-mr1-cdh4.4.0.jar:/home/cloudera/.m2/repository/hsqldb/hsqldb/1.8.0.10/hsqldb-1.8.0.10.jar:/home/cloudera/Downloads/idea-IC-135.909/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain org.apache.hadoop.util.RunJar /home/cloudera/IdeaProjects/MavenFileHandy/target/FileHandy.jar com.company.FileRun arg1 arg2

为什么脚本在通过 IDE 运行时会抛出异常并且找不到 Mapper 类,并通过 hadoop jar ... 成功完成相同的脚本命令?

谢谢

最佳答案

我已经找到原因了。 TaskTrackers 无法运行作业任务( map ),因为 jar 文件不在 Distributed Cash 中。为了解决这个问题,需要将 jar 文件添加到项目类路径中。步骤是:

File -> Project Structure -> Libraries, type '+' at the bottom pane and add jar file

关于java - 通过 Intellij IDEA 运行 Hadoop MR 作业,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25018356/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com