gpt4 book ai didi

Python Hadoop 流错误 "ERROR streaming.StreamJob: Job not Successful!"和堆栈跟踪 : ExitCodeException exitCode=134

转载 作者:可可西里 更新时间:2023-11-01 14:23:24 24 4
gpt4 key购买 nike

我正在尝试使用 Hadoop Streaming 在 Hadoop 集群上运行 python 脚本以进行情绪分析。我在本地机器上运行的相同脚本正常运行并提供输出。
要在本地机器上运行,我使用此命令。

$ cat /home/MB/analytics/Data/input/* | ./new_mapper.py

为了在 hadoop 集群上运行,我使用以下命令

$ hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.2.0.jar -mapper "python $PWD/new_mapper.py" -reducer "$PWD/new_reducer.py" -input /user/hduser/Test_04012015_Data/input/* -output /user/hduser/python-mr/out-mr-out

我的脚本示例代码是

#!/usr/bin/env python
import sys


def main(argv):
## for line in sys.stdin:
## print line
for line in sys.stdin:
line = line.split(',')
t_text = re.sub(r'[?|$|.|!|,|!|?|;]',r'',line[7])
words = re.findall(r"[\w']+", t_text.rstrip())
predicted = classifier.classify(feature_select(words))
i=i+1
referenceSets[predicted].add(i)
testSets[predicted].add(i)
print line[7] +'\t'+predicted

if __name__ == "__main__":
main(sys.argv)

Exception 的堆栈跟踪是:

    15/04/22 12:55:14 INFO mapreduce.Job: Task Id : attempt_1429611942931_0010_m_000001_0, Status : FAILED
Error: java.io.IOException: Stream closed at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:434)
...

Exit code: 134
Exception message: /bin/bash: line 1: 1691 Aborted
(core dumped) /usr/lib/jvm/java-7-oracle-cloudera/bin/java
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.net.preferIPv4Stack=true -Xmx525955249
-Djava.io.tmpdir=/yarn/nm/usercache/hduser/appcache/application_1429611942931_0010/container_1429611942931_0010_01_000016/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1429611942931_0010/container_1429611942931_0010_01_000016 -Dyarn.app.container.log.filesize=0
-Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 192.168.0.122 48725 attempt_1429611942931_0010_m_000006_1 16 > /var/log/hadoop-yarn/container/application_1429611942931_0010/container_1429611942931_0010_01_000016/stdout 2> /var/log/hadoop-yarn/container/application_1429611942931_0010/container_1429611942931_0010_01_000016/stderr
....

15/04/22 12:55:47 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!

我试图查看日志,但在 hue 中它显示了这个错误。 enter image description here请给我建议,出了什么问题。

最佳答案

您似乎忘记添加文件 new_mapper.py到你的工作。

基本上,您的作业会尝试运行 python 脚本 new_mapper.py ,但是运行映射器的服务器上缺少此脚本。

您必须使用选项 -file <local_path_to_your_file> 将此文件添加到您的作业中.

请参阅此处的文档和示例:https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopStreaming.html#Streaming_Command_Options

关于Python Hadoop 流错误 "ERROR streaming.StreamJob: Job not Successful!"和堆栈跟踪 : ExitCodeException exitCode=134,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29791437/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com