gpt4 book ai didi

Hadoop 与服务器的连接突然停止

转载 作者:可可西里 更新时间:2023-11-01 16:12:18 32 4
gpt4 key购买 nike

我正在运行一个 hadoop 进程,这需要几个小时,但突然出于某种原因(我不知道)它停止给出以下错误:

HadoopTree.mapredUtils.JobResultException: //0/0/0/0 could not be properly divided by SplitSamples
at HadoopTree.TTrain.TreeTrainer_sp$SplitSamplesListener.stateChanged(TreeTrainer_sp.java:335)
at HadoopTree.mapredUtils.JobResultManager.poll(JobResultManager.java:76)
at HadoopTree.TTrain.TreeTrainer_sp.developTree(TreeTrainer_sp.java:577)
at HadoopTree.apps.MainTrainTree.run(MainTrainTree.java:64)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
at HadoopTree.apps.MainTrainTree.main(MainTrainTree.java:26)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at HadoopTree.apps.Driver.main(Driver.java:37)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:192)

我检查了这里的错误日志,我发现在错误出现之前,这是写入辅助名称节点的日志文件中的正常系统日志消息:

  2015-02-18 08:35:11,834 INFO org.apache.hadoop.security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
2015-02-18 08:35:12,010 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2015-02-18 08:35:12,014 WARN org.apache.hadoop.conf.Configuration: user.name is deprecated. Instead, use mapreduce.job.user.name
2015-02-18 08:35:12,060 WARN org.apache.hadoop.conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2015-02-18 08:35:12,089 INFO org.apache.hadoop.mapred.Task: Task:attempt_201502172051_0618_r_000003_0 is done. And is in the process of commiting
2015-02-18 08:35:12,091 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201502172051_0618_r_000003_0' done.

在出现此错误时,这是​​在次级名称节点日志文件中写入的内容:

  2015-02-18 09:55:08,962 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 0 time(s).
2015-02-18 09:55:09,963 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 1 time(s).
2015-02-18 09:55:10,963 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 2 time(s).
2015-02-18 09:55:11,964 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 3 time(s).
2015-02-18 09:55:12,965 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 4 time(s).
2015-02-18 09:55:13,965 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 5 time(s).
2015-02-18 09:55:14,966 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 6 time(s).
2015-02-18 09:55:15,966 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 7 time(s).
2015-02-18 09:55:16,967 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 8 time(s).
2015-02-18 09:55:17,968 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:54310. Already tried 9 time(s).
2015-02-18 09:55:17,968 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint:
2015-02-18 09:55:17,968 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.net.ConnectException: Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:932)
at org.apache.hadoop.ipc.Client.call(Client.java:908)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at com.sun.proxy.$Proxy4.getEditLogSize(Unknown Source)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:417)
at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:207)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1025)
at org.apache.hadoop.ipc.Client.call(Client.java:885)
... 4 more

2015-02-18 10:00:18,970 INFO org.apache.hadoop.ipc.Client: Retrying connect

我在名称节点日志文件中也发现了这个错误:

java.io.IOException: File /jobtracker/jobsInfo/job_201502172051_0597.info could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1448)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:690)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1344)

最佳答案

查看 Namenode 中的异常,似乎 Namenode 没有获得足够的 Datanode(最少 1 个)来复制文件 job_201502172051_0597.info 的 block 。检查 Datanode 日志以查看是否存在任何问题。

关于Hadoop 与服务器的连接突然停止,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28573537/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com