gpt4 book ai didi

Hadoop 上次减少作业卡在连接超时

转载 作者:可可西里 更新时间:2023-11-01 14:32:11 26 4
gpt4 key购买 nike

我的 hadoop 作业卡在了最后的 reduce 任务上。我已经看到很多从 3 个不同主机到单个主机的连接超时。但是,我能够从任何其他机器 ping 有问题的机器。

这是一个 5 节点集群。它是最近 build 的。它们具有相同的 hadoop 和 pig 二进制文件。它有 3 台新机器和 2 台旧机器。如果我移除 2 台旧机器,它工作正常。

有问题的旧机器版本:

java version "1.6.0_20"
OpenJDK Runtime Environment (IcedTea6 1.9.13) (6b20-1.9.13-0ubuntu1~10.10.1)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

新工作机版本:

java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.3) (6b24-1.11.3-1ubuntu0.12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

我还从日志中看到了数量惊人的 (180) 个重复主机。

2012-08-28 12:34:53,307 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Need another 4 map output(s) where 1 is already in progress
2012-08-28 12:34:53,307 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Scheduled 0 outputs (0 slow hosts and3 dup hosts)
2012-08-28 12:35:53,308 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Need another 4 map output(s) where 1 is already in progress
2012-08-28 12:35:53,308 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Scheduled 0 outputs (0 slow hosts and3 dup hosts)
2012-08-28 12:36:53,309 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Need another 4 map output(s) where 1 is already in progress
2012-08-28 12:36:53,309 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Scheduled 0 outputs (0 slow hosts and3 dup hosts)
2012-08-28 12:37:18,416 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 copy failed: attempt_201208231557_0021_m_000002_0 from Han
2012-08-28 12:37:18,417 WARN org.apache.hadoop.mapred.ReduceTask: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at sun.net.NetworkClient.doConnect(NetworkClient.java:173)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:409)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:240)
at sun.net.www.http.HttpClient.New(HttpClient.java:321)
at sun.net.www.http.HttpClient.New(HttpClient.java:338)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:935)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:801)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1618)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1575)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1483)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1394)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1326)

2012-08-28 12:37:18,417 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201208231557_0021_r_000008_0: Failed fetch #6 from attempt_201208231557_0021_m_000002_0
2012-08-28 12:37:18,417 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 adding host Han to penalty box, next contact in 48 seconds
2012-08-28 12:37:18,417 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0: Got 1 map-outputs from previous failures
2012-08-28 12:37:53,418 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Need another 4 map output(s) where 0 is already in progress
2012-08-28 12:37:53,418 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Scheduled 0 outputs (1 slow hosts and0 dup hosts)
2012-08-28 12:37:53,418 INFO org.apache.hadoop.mapred.ReduceTask: Penalized(slow) Hosts:
2012-08-28 12:37:53,418 INFO org.apache.hadoop.mapred.ReduceTask: Han Will be considered after: 13 seconds.
2012-08-28 12:38:08,418 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2012-08-28 12:38:53,419 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Need another 4 map output(s) where 1 is already in progress
2012-08-28 12:38:53,419 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Scheduled 0 outputs (0 slow hosts and3 dup hosts)
2012-08-28 12:39:53,420 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Need another 4 map output(s) where 1 is already in progress
2012-08-28 12:39:53,420 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Scheduled 0 outputs (0 slow hosts and3 dup hosts)
2012-08-28 12:40:53,421 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Need another 4 map output(s) where 1 is already in progress
2012-08-28 12:40:53,421 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Scheduled 0 outputs (0 slow hosts and3 dup hosts)
2012-08-28 12:41:08,537 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 copy failed: attempt_201208231557_0021_m_000002_0 from Han
2012-08-28 12:41:08,538 WARN org.apache.hadoop.mapred.ReduceTask: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at sun.net.NetworkClient.doConnect(NetworkClient.java:173)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:409)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:240)
at sun.net.www.http.HttpClient.New(HttpClient.java:321)
at sun.net.www.http.HttpClient.New(HttpClient.java:338)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:935)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:801)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1618)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1575)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1483)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1394)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1326)

2012-08-28 12:41:08,538 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201208231557_0021_r_000008_0: Failed fetch #7 from attempt_201208231557_0021_m_000002_0
2012-08-28 12:41:08,538 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 adding host Han to penalty box, next contact in 62 seconds
2012-08-28 12:41:08,538 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0: Got 1 map-outputs from previous failures
2012-08-28 12:41:53,539 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Need another 4 map output(s) where 0 is already in progress
2012-08-28 12:41:53,539 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Scheduled 0 outputs (1 slow hosts and0 dup hosts)
2012-08-28 12:41:53,539 INFO org.apache.hadoop.mapred.ReduceTask: Penalized(slow) Hosts:
2012-08-28 12:41:53,539 INFO org.apache.hadoop.mapred.ReduceTask: Han Will be considered after: 17 seconds.
2012-08-28 12:42:13,540 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2012-08-28 12:42:53,540 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Need another 4 map output(s) where 1 is already in progress
2012-08-28 12:42:53,540 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201208231557_0021_r_000008_0 Scheduled 0 outputs (0 slow hosts and3 dup hosts)

更多日志:

2012-09-06 11:01:47,719 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201209041720_0010_r_000014_0 1.0% reduce > reduce
2012-09-06 11:01:47,720 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201209041720_0010_r_000014_0 is done.
2012-09-06 11:01:47,720 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201209041720_0010_r_000014_0 was -1
2012-09-06 11:01:47,720 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 6
2012-09-06 11:01:47,769 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201209041720_0010_r_1617080101 exited with exit code 0. Number of tasks it ran: 1
2012-09-06 11:01:47,913 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201209041720_0010_r_-434797671 exited with exit code 0. Number of tasks it ran: 1
2012-09-06 11:01:50,428 INFO org.apache.hadoop.mapred.TaskTracker: Received KillTaskAction for task: attempt_201209041720_0010_r_000014_0
2012-09-06 11:01:50,428 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201209041720_0010_r_000014_0
2012-09-06 11:01:50,429 INFO org.apache.hadoop.mapred.TaskTracker: Received KillTaskAction for task: attempt_201209041720_0010_r_000020_0
2012-09-06 11:01:50,429 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201209041720_0010_r_000020_0

最佳答案

您是否使用默认的推测执行设置? (4 次尝试)- 这是一个 article这可能会有所帮助。

关于Hadoop 上次减少作业卡在连接超时,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12166510/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com