gpt4 book ai didi

python-2.7 - java.io.IOException : Broken pipe on increasing number of mappers/reducers, 很多

转载 作者:可可西里 更新时间:2023-11-01 14:38:34 27 4
gpt4 key购买 nike

我在 6 个节点的 hadoop 集群上运行 MapReduce 作业,配置了 4 个映射任务和 10 个缩减任务。

Mapper/Reducer 在增加 map/reduce 任务数量时失败很多,如下所示,

Task running on multiple nodes

我遇到以下错误:

标准错误日志

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 143
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:130)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

还有这个:

系统日志

2014-03-01 15:11:30,118 WARN org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:260)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:569)
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:130)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

2014-03-01 15:11:30,118 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
2014-03-01 15:11:30,121 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2014-03-01 15:11:30,146 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2014-03-01 15:11:30,146 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName hduser for UID 1001 from the native implementation
2014-03-01 15:11:30,147 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 143
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:130)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2014-03-01 15:11:30,149 WARN org.apache.hadoop.mapred.Task: Parent died. Exiting attempt_201402281751_0042_r_000004_0
2014-03-01 15:11:31,252 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=983976/1957694

即使是最简单的程序也会产生这个问题:

映射器.py

#!/usr/bin/env python

import sys
for line in sys.stdin:
if line:
print "%s\n%s"%(line, line)

reducer.py

#!/usr/bin/env python

import sys
for line in sys.stdin:
if line:
print "%s"%(line)

我正在使用以下命令来运行 hadoop。

hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.0.3.jar -D mapred.reduce.tasks=10 -file /home/hduser/code/K1D/code1/mapper2.py -mapper mapper2.py -file /home/hduser/code/K1D/code1/reducer2.py -reducer reducer2.py -input /user/hduser/data-out/part-00000 -output /user/hduser/data-out1 -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner

你有什么建议吗?

最佳答案

我遇到了类似的事情,并意识到我没有在我的一个数据节点中安装确切版本的 python。我有

#!/usr/bin/env python

我改成了

#!/usr/bin/env python2.7

关于python-2.7 - java.io.IOException : Broken pipe on increasing number of mappers/reducers, 很多,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22113000/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com