gpt4 book ai didi

hadoop - 使用 MR1 CDH4 运行简单的 MapReduce Streaming 作业失败

转载 作者:可可西里 更新时间:2023-11-01 14:59:38 31 4
gpt4 key购买 nike

我有一个最近从 CDH3 升级到 CDH4 的集群。 Hive 目前运行良好。然而,我似乎无法让它运行简单的 MR Streaming 作业(版本 1)。 Yarn 已安装但未使用。下面是命令行输入输出

$ /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-  streaming-2.0.0-mr1-cdh4.0.0.jar grep -input /input -output /output/ 'dfs[a-z.]+'

检查日志显示:

packageJobJar: [/tmp/hadoop-hdfs/hadoop-unjar7491355516546899751/] [] /tmp/streamjob1375201380112960182.jar tmpDir=null
12/07/12 07:26:29 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/07/12 07:26:29 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/07/12 07:26:29 WARN snappy.LoadSnappy: Snappy native library is available
12/07/12 07:26:29 INFO snappy.LoadSnappy: Snappy native library loaded
12/07/12 07:26:29 INFO mapred.FileInputFormat: Total input paths to process : 3
12/07/12 07:26:29 INFO streaming.StreamJob: getLocalDirs(): [file:////data/hadoop-0.20/cache/mapred/mapred/local]
12/07/12 07:26:29 INFO streaming.StreamJob: Running job: job_201207120604_0018
12/07/12 07:26:29 INFO streaming.StreamJob: To kill this job, run:
12/07/12 07:26:29 INFO streaming.StreamJob: UNDEF/bin/hadoop job -Dmapred.job.tracker=frost:54311 -kill job_201207120604_0018
12/07/12 07:26:29 INFO streaming.StreamJob: Tracking URL: >http://alpha:50030/jobdetails.jsp?jobid=job_201207120604_0018
12/07/12 07:26:30 INFO streaming.StreamJob: map 0% reduce 0%
12/07/12 07:26:57 INFO streaming.StreamJob: map 100% reduce 100%
12/07/12 07:26:57 INFO streaming.StreamJob: To kill this job, run:
12/07/12 07:26:57 INFO streaming.StreamJob: UNDEF/bin/hadoop job -Dmapred.job.tracker=frost:54311 -kill job_201207120604_0018
12/07/12 07:26:57 INFO streaming.StreamJob: Tracking URL: >http://alpha:50030/jobdetails.jsp?jobid=job_201207120604_0018
12/07/12 07:26:57 ERROR streaming.StreamJob: Job not successful. Error: NA
12/07/12 07:26:57 INFO streaming.StreamJob: killJob...
Streaming Command Failed!

日志输出大量失败 reduce task :

2012-07-12 07:26:46,785 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201207120604_0018_m_000001_2: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:861)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:501)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:264)

最佳答案

您能否查看为您提交的作业生成的 job.xml(通过 JobTracker Web 界面),并查看映射器的定义值是什么? (mapreduce.map.class 属性)。

从您包含的映射日志来看,您可能配置了身份映射器(因此 LongWritable 作为输出键出现,而不是作业定义的文本值):

Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable

如果是这种情况,您需要查看 2.0.0 hadoop 的 Streaming 代码(我没有立即拿到),看看如何启动 hadoop-streaming-2.0。 0-mr1-cdh4.0.0.jar 带有参数的 jar 将配置并运行作业

关于hadoop - 使用 MR1 CDH4 运行简单的 MapReduce Streaming 作业失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11452158/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com