gpt4 book ai didi

hadoop - 使用hadoop执行jar文件

转载 作者:可可西里 更新时间:2023-11-01 16:26:31 26 4
gpt4 key购买 nike

我想执行一个 jar 文件,它在从命令行执行时工作正常:

java -Xmx3g -jar jarname.jar -T class_name_in_jar -R filename1 -I filename2 -known filename3 -o filename4

以上命令通过输入文件名 1、文件名 2 和文件名 3 来执行 *class_name_in_jar*。它将在 filename4 中生成输出。

这是我的 map reduce 程序:

import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class GatkWordCount {

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String find_targets_cmd = "java -Xmx3g -jar <jarname>.jar -T <class name in jar> -R <filename1> -I <filename2> -known <filename3> -o <filename4>";

exceptionOnError(execAndReconnect(find_targets_cmd));
}
}

public static int execAndReconnect(String cmd) throws IOException {
Process p = Runtime.getRuntime().exec(cmd);
p.waitFor();
return p.exitValue();
}

public static void exceptionOnError(int errorCode) throws IOException{
if(0 != errorCode)
throw new IOException(String.valueOf(errorCode));
}

public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(GatkWordCount.class);
conf.setJobName("GatkWordCount");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}
}

HDFS 中,我已经放置了所有需要的输入文件。我执行了以下命令:

  enter code herehadoop/bin/hadoop jar gatkword.jar GatkWordCount /user/hduser/gatkinput/gatkinput/group.bam /user/hduser/gatkword2

下面是执行上述命令后得到的错误信息:

13/12/29 17:58:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/12/29 17:58:59 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/12/29 17:58:59 WARN snappy.LoadSnappy: Snappy native library not loaded
13/12/29 17:58:59 INFO mapred.FileInputFormat: Total input paths to process : 1
13/12/29 17:58:59 INFO mapred.JobClient: Running job: job_201312261425_0013
13/12/29 17:59:00 INFO mapred.JobClient: map 0% reduce 0%
13/12/29 17:59:06 INFO mapred.JobClient: Task Id : attempt_201312261425_0013_m_000000_0, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1014)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/12/29 17:59:06 INFO mapred.JobClient: Task Id : attempt_201312261425_0013_m_000001_0, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1014)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/12/29 17:59:11 INFO mapred.JobClient: Task Id : attempt_201312261425_0013_m_000000_1, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1014)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/12/29 17:59:11 INFO mapred.JobClient: Task Id : attempt_201312261425_0013_m_000001_1, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1014)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/12/29 17:59:17 INFO mapred.JobClient: Task Id : attempt_201312261425_0013_m_000000_2, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1014)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/12/29 17:59:17 INFO mapred.JobClient: Task Id : attempt_201312261425_0013_m_000001_2, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1014)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/12/29 17:59:22 INFO mapred.JobClient: Job complete: job_201312261425_0013
13/12/29 17:59:22 INFO mapred.JobClient: Counters: 7
13/12/29 17:59:22 INFO mapred.JobClient: Job Counters
13/12/29 17:59:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=42572
13/12/29 17:59:22 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/12/29 17:59:22 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/12/29 17:59:22 INFO mapred.JobClient: Launched map tasks=8
13/12/29 17:59:22 INFO mapred.JobClient: Data-local map tasks=8
13/12/29 17:59:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/12/29 17:59:22 INFO mapred.JobClient: Failed map tasks=1
13/12/29 17:59:22 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201312261425_0013_m_000000
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1327)
at GatkWordCount.main(GatkWordCount.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

请建议我的代码需要更改哪些内容才能正确执行。感谢您的帮助。

最佳答案

在这个例子中你没有指定映射器类,你不能使用默认的身份映射器。

这是因为您指定的 TextInputFormat 生成 LongWritable(行号)作为键,Text(行内容)作为值。所以默认的身份映射器将忠实地发出 Longwritable 和 Text 不变。由于您已将 outputKeyClass 指定为 Text,因此映射器 (LongWritable) 的键输出与映射器输出整理系统 (Text) 期望的键类型之间存在类不匹配。您也会在值字段上遇到类似的不匹配,但系统首先在键字段上失败。

要解决此问题,您必须编写自己的映射器类,它接受一个 LongWritable,Text 并输出一个 Text,Intwritable。

编辑:我刚刚仔细查看了您的代码。您只是使用 mapreduce 框架在 reducer 中执行 java jar,这似乎明显违反了 Hadoop 的精神(使用 MapReduce 在 HDFS 数据上进行计算)。实际上,我会重新检查您到底想用这个应用程序做什么,而不是花更多时间尝试让它在 mapreduce 中工作。

关于hadoop - 使用hadoop执行jar文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20840357/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com