gpt4 book ai didi

hadoop - 在 yarn-cluster 模式下运行 Spark 时出错(应用程序返回退出代码 1)

转载 作者:可可西里 更新时间:2023-11-01 14:51:30 25 4
gpt4 key购买 nike

我有一个 SPARK 作业不断返回退出代码 1,我无法弄清楚这个特定退出代码的含义以及应用程序返回此代码的原因。这是我在节点管理器日志中看到的 -

2017-07-10 07:54:03,839 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1499673023544_0001_01_000001 and exit code: 1
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1499673023544_0001_01_000001
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 1
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=1:
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:456)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2017-07-10 07:54:03,843 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.lang.Thread.run(Thread.java:745)
2017-07-10 07:54:03,844 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 1
2017-07-10 07:54:03,846 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1499673023544_0001_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2017-07-10 07:54:03,846 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1499673023544_0001_01_000001

当我检查特定应用程序(和容器)的日志时,它没有返回任何特定的堆栈跟踪或错误消息。这是我在作业终止时在容器日志 (stderr) 中看到的内容。

INFO impl.ContainerManagementProtocolProxy: Opening proxy : myplayground:52311
17/07/10 07:54:02 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. myplayground:36322
17/07/10 07:54:03 INFO cluster.YarnClusterSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@myplayground:49562/user/Executor#509101946]) with ID 1
17/07/10 07:54:03 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
17/07/10 07:54:03 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done
17/07/10 07:54:03 ERROR yarn.ApplicationMaster: User application exited with status 1
17/07/10 07:54:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
17/07/10 07:54:03 INFO spark.SparkContext: Invoking stop() from shutdown hook
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
17/07/10 07:54:03 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
17/07/10 07:54:03 INFO ui.SparkUI: Stopped Spark web UI at http://x.x.x.x:37961
17/07/10 07:54:03 INFO scheduler.DAGScheduler: Stopping DAGScheduler
17/07/10 07:54:03 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
17/07/10 07:54:03 INFO cluster.YarnClusterSchedulerBackend: Asking each executor to shut down
17/07/10 07:54:03 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/07/10 07:54:03 INFO storage.MemoryStore: MemoryStore cleared
17/07/10 07:54:03 INFO storage.BlockManager: BlockManager stopped
17/07/10 07:54:03 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/07/10 07:54:03 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/07/10 07:54:03 INFO spark.SparkContext: Successfully stopped SparkContext
17/07/10 07:54:03 INFO util.ShutdownHookManager: Shutdown hook called
17/07/10 07:54:03 INFO util.ShutdownHookManager: Deleting directory /tmp/Hadoop-hadoop/nm-local-dir/usercache/myprdusr/appcache/application_1499673023544_0001/spark-2adeda9f-9244-4519-b87f-ec895a50cfcd
17/07/10 07:54:03 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/07/10 07:54:03 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

因此,在这两个日志中,我所看到的只是应用程序以退出代码 1 退出。谁能告诉我这个特定错误代码的含义以及 Yarn 抛出此异常的可能原因?

最佳答案

我终于能够解决这个问题。发生的事情是我调用 spark-submit 的 bash 脚本向它传递了一个无效参数。当作业开始时,一个名为 launch_container.sh 的脚本将执行 org.apache.spark.deploy.yarn.ApplicationMaster 并将参数传递给 spark-submit 并且 ApplicationMaster 返回退出代码 1当它的任何参数无效时。

更多信息 here

关于hadoop - 在 yarn-cluster 模式下运行 Spark 时出错(应用程序返回退出代码 1),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45007772/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com