gpt4 book ai didi

python - Hadoop mapreduce 任务失败并显示 143

转载 作者:可可西里 更新时间:2023-11-01 14:56:28 28 4
gpt4 key购买 nike

我目前正在学习使用 Hadoop mapred,但遇到了这个错误:

packageJobJar: [/home/hduser/mapper.py, /home/hduser/reducer.py, /tmp/hadoop-unjar4635332780289131423/] [] /tmp/streamjob8641038855230304864.jar tmpDir=null
16/10/31 17:41:12 INFO client.RMProxy: Connecting to ResourceManager at /192.168.0.55:8050
16/10/31 17:41:13 INFO client.RMProxy: Connecting to ResourceManager at /192.168.0.55:8050
16/10/31 17:41:15 INFO mapred.FileInputFormat: Total input paths to process : 1
16/10/31 17:41:17 INFO mapreduce.JobSubmitter: number of splits:2
16/10/31 17:41:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477933345919_0004
16/10/31 17:41:19 INFO impl.YarnClientImpl: Submitted application application_1477933345919_0004
16/10/31 17:41:19 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1477933345919_0004/
16/10/31 17:41:19 INFO mapreduce.Job: Running job: job_1477933345919_0004
16/10/31 17:41:38 INFO mapreduce.Job: Job job_1477933345919_0004 running in uber mode : false
16/10/31 17:41:38 INFO mapreduce.Job: map 0% reduce 0%
16/10/31 17:41:56 INFO mapreduce.Job: map 100% reduce 0%
16/10/31 17:42:19 INFO mapreduce.Job: Task Id : attempt_1477933345919_0004_r_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

我无法弄清楚如何修复此错误,并且一直在互联网上搜索。我用于映射器的代码是:

导入系统

for line in sys.stdin:
line = line.strip()
words = line.split()

for word in words:
print '%s\t%s' % (word, 1)

reducer 的代码是:

from operator import itemgetter
import sys

current_word = None
current_count = 0
word = None

for line in sys.stdin:
line = line.strip()
word, count = line.split('\t', 1)

try:
count = int(count)
except ValueError:
continue

if current_word == word:
current_count += count
else:
if current_word:
print '%s\t%s' % (current_word, current_count)
current_count = count
current_word = word

if current_word == word:
print '%s\t%s' % (current_word, current_count)

为了运行我正在使用的任务:

hduser@master:/opt/hadoop-2.7.3/share/hadoop/tools/lib $ hadoop jar hadoop-streaming-2.7.3.jar -file /home/hduser/mapper.py -mapper "python mapper.py" -file /home/hduser/reducer.py -reducer "python reducer.py" -input ~/testDocument -output ~/results1

任何帮助将不胜感激,因为我是 Hadoop 的新手。如果需要更多日志或信息,请随时询问。

最佳答案

查看日志以查找 Python 代码中的错误。对于 EMR/yarn,您可以从 WEB UI 或集群主 shell 中找到您的日志,如下所示(您的应用程序 ID 将与作业开始时打印的不同)。有很多输出,将其重定向到我显示的文件中并查找 python 堆栈跟踪。

$ yarn logs -applicationId application_1503951120983_0031 > /tmp/log 

关于python - Hadoop mapreduce 任务失败并显示 143,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40347638/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com