python-2.7 - java.io.IOException : Broken pipe on increasing number of mappers/reducers, 很多-6ren

python-2.7 - java.io.IOException : Broken pipe on increasing number of mappers/reducers, 很多

转载作者：可可西里更新时间：2023-11-01 14:38:34

27

4

我在 6 个节点的 hadoop 集群上运行 MapReduce 作业，配置了 4 个映射任务和 10 个缩减任务。

Mapper/Reducer 在增加 map/reduce 任务数量时失败很多，如下所示，

Task running on multiple nodes

我遇到以下错误:

标准错误日志

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 143
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:130)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

还有这个:

系统日志

2014-03-01 15:11:30,118 WARN org.apache.hadoop.streaming.PipeMapRed: java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:260)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:569)
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:130)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

2014-03-01 15:11:30,118 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
2014-03-01 15:11:30,121 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2014-03-01 15:11:30,146 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2014-03-01 15:11:30,146 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName hduser for UID 1001 from the native implementation
2014-03-01 15:11:30,147 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 143
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:130)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2014-03-01 15:11:30,149 WARN org.apache.hadoop.mapred.Task: Parent died.  Exiting attempt_201402281751_0042_r_000004_0
2014-03-01 15:11:31,252 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=983976/1957694

即使是最简单的程序也会产生这个问题:

映射器.py

#!/usr/bin/env python

import sys
for line in sys.stdin:
    if line:
        print "%s\n%s"%(line, line)

reducer.py

#!/usr/bin/env python

import sys
for line in sys.stdin:
    if line:
        print "%s"%(line)

我正在使用以下命令来运行 hadoop。

hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.0.3.jar -D mapred.reduce.tasks=10 -file /home/hduser/code/K1D/code1/mapper2.py -mapper mapper2.py -file /home/hduser/code/K1D/code1/reducer2.py -reducer reducer2.py -input /user/hduser/data-out/part-00000 -output /user/hduser/data-out1 -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner

你有什么建议吗？

最佳答案

我遇到了类似的事情，并意识到我没有在我的一个数据节点中安装确切版本的 python。我有

#!/usr/bin/env python

我改成了

#!/usr/bin/env python2.7

关于python-2.7 - java.io.IOException : Broken pipe on increasing number of mappers/reducers, 很多，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22113000/

27

4

0

文章推荐： c++ - 命名管道读取超时

文章推荐： windows - 我如何让 Perl 理解导航到路径中有空格的目录？

文章推荐： windows - 如何通过批处理文件知道命令是否执行成功？

文章推荐： c++ - 使用 Qt 进行 USB 编程

pipe - gnuplot : plot pipe output
我有管道输出 command_a | command_b | ... | command_n 输出是一个数字序列 4.2 -1 ... 0.2 我可以使用 gnuplot 绘制这些数字吗？ (将 gn
python - 套接字类比 : a pipe or two pipes?
关闭。这个问题是opinion-based 。目前不接受答案。想要改进这个问题吗？更新问题，以便 editing this post 可以用事实和引文来回答它。 . 已关闭 6 年前。 Improv
pipe - 在 C++ 中使用 pipe() 管道管道
我目前正在尝试连接父项和子项之间的管道。子级正在执行 sort 并对从父级接收到的输入进行排序。然后 children 写入一个单独的管道。每个进程有两个管道。一个这样 parent 可以将输入发送给
Python os.pipe 与 multiprocessing.Pipe
最近我正在研究 Python 中的并行编程工具。这是 os.pipe 和 multiprocessing.Pipe 之间的两个主要区别。(尽管它们被使用的场合) os.pipe是单向，multipro
yahoo-pipes - 在 Yahoo Pipes 中，如何删除一个字段？
我的站点上运行着 Yahoo Pipe，Romneyomics它使用来自 Delicious 和 Topsy 的饲料。Delicious 提要不提供“描述”字段，但 Topsy 提供，并且不仅仅是一个
haskell - 如何使 Pipe 与 Haskell 的 Pipe 库并发？
我有一些使用管道的 Haskell 代码: module Main(main) where import Pipes a :: Producer Int IO () a = each [1..10]
python - stdout=subprocess.PIPE 和 stdout=PIPE 之间的区别
所以标题几乎解释了我的问题。 stdout=subprocess.PIPE 和 stdout=PIPE 有什么区别？两者都来自 subprocess 模块，但为什么要使用一个而不是另一个呢？你如何使用
unit-testing - Angular 2 单元测试 : Custom Pipe error The pipe could not be found
我有一个名为“myPipe”的自定义管道。我得到: The pipe 'myPipe' could not be found error 在我的单元测试中请建议在我的 .spec.ts 中导入和声明什
python - IOError : [Errno 32] Broken pipe when piping: `prog.py | othercmd`
我有一个非常简单的 Python 3 脚本: f1 = open('a.txt', 'r') print(f1.readlines()) f2 = open('b.txt', 'r') print(f
python - 为什么 pipe.close() 在 python 多处理中的 pipe.recv() 期间不会导致 EOFError？
我正在使用管道和 Python 的多处理模块在进程之间发送简单的对象。文档指出，如果管道已关闭，则调用 pipe.recv() 应该引发 EOFError。相反，我的程序只是阻塞在 recv() 上，
linux - pipe open '|' 和 '|-' 的区别(safe pipe open)
我在 perl 中见过这两种形式的管道 open。一种是简单的管道打开 open FH,'| command'; 其他是安全管道打开 open FH,'|-','command'; 现在，第二个中的
angular - 对话框测试 - 当我模拟 this.store$.pipe( select(...) ...).subscribe(..) 时，Jasmine 返回 this.store$.pipe 不是函数
我正在尝试对我的组件进行单元测试，但它立即生成一个错误: 类型错误:this.store$.pipe 不是函数根据我的理解， createSpyObj 应该模拟状态。我有不同的选项选项，但没有一个起
python - 从 python subprocess.Popen(command, stderr=subprocess.PIPE, stdout=subprocess.PIPE) 捕获 stderr
我在这里看到这个帖子很多次了；但未能从命令中捕获故意错误。迄今为止我找到的最好的部分工作.. from Tkinter import * import os import Image, ImageTk
pipe - 如何在管道命令行中使用第一个程序的返回码
我正在编写一个简单的程序来解析编译器的输出并重新格式化任何错误消息，以便我们使用的 IDE(visual studio)可以解析它们。我们使用 nmake构建，它将使用如下命令行调用编译器: cc16
pipe - 如何将stdin管道传送到Kubernetes中的容器中的容器中？
我有一个在coreos上运行的kubernetes集群。我希望在称为记录的Pod中的容器中运行journal2gelf https://github.com/systemd/journal2gelf。
pipe - 为什么管道中存在不对称行为
为什么当管道中没有写入器时，读取器存在可以，但当管道中没有读取器时，写入器存在就不行？。是不是因为reader需要等待，所以没有writer也没关系，而writer已经准备好数据了，即使数据准备好了
pipe - 重定向后缀命令管道的输出
我在/etc/postfix/master.cf 中创建了一个 postfix 命令管道，其中包含一个在 STDOUT 和 STDERR 上产生输出的有效命令。在终端上调用时一切正常(因此在 STDO
pipe - 如何通过管道传输字符串以处理'STDIN？
我有一个命令需要来自管道的输入。例如，考虑著名的 cat 命令: $ echo Hello | cat Hello 假设我在 Perl 6 程序中有一个字符串，我想将其通过管道传递给命令: use v
RXJS Pipe - 您能否将一个可观察结果附加到另一个可观察结果并获得两个结果？
因此，由于我们拥有各种设置，我习惯于遇到需要将一个可观察结果添加到另一个结果的地方，然后同时使用两者。我需要第一个在另一个之前完成的地方 getUser() .pipe( mergeMap
Angular Pipe 不适用于子路线
我在 Angular 5 中有一个非常简单的管道 import { Pipe, Injectable } from '@angular/core'; @Pipe({ name: "defaul

首页

博学

6Ren·AI

商城

python-2.7 - java.io.IOException : Broken pipe on increasing number of mappers/reducers, 很多