gpt4 book ai didi

apache-spark - 遗失的遗嘱执行人Spark

转载 作者:行者123 更新时间:2023-12-02 11:17:08 27 4
gpt4 key购买 nike

我在 Spark 上有一个长期运行的工作,运行了几个小时后失败,并出现以下错误。

18/10/09 03:22:15 ERROR YarnScheduler: Lost executor 547 on ip: Unable to create executor due to Unable to register with external shuffle server due to : java.util.concurrent.TimeoutException: Timeout waiting for task.
18/10/09 03:22:15 WARN TaskSetManager: Lost task 750.0 in stage 19.0 (TID 1565492, ip, executor 547): ExecutorLostFailure (executor 547 exited caused by one of the running tasks) Reason: Unable to create executor due to Unable to register with external shuffle server due to : java.util.concurrent.TimeoutException: Timeout waiting for task.
18/10/09 03:22:15 WARN TaskSetManager: Lost task 752.0 in stage 19.0 (TID 1565494, ip, executor 547): ExecutorLostFailure (executor 547 exited caused by one of the running tasks) Reason: Unable to create executor due to Unable to register with external shuffle server due to : java.util.concurrent.TimeoutException: Timeout waiting for task.
18/10/09 03:22:15 WARN TaskSetManager: Lost task 751.0 in stage 19.0 (TID 1565493, ip, executor 547): ExecutorLostFailure (executor 547 exited caused by one of the running tasks) Reason: Unable to create executor due to Unable to register with external shuffle server due to : java.util.concurrent.TimeoutException: Timeout waiting for task.
18/10/09 03:22:15 WARN TaskSetManager: Lost task 754.0 in stage 19.0 (TID 1565496, ip, executor 547): ExecutorLostFailure (executor 547 exited caused by one of the running tasks) Reason: Unable to create executor due to Unable to register with external shuffle server due to : java.util.concurrent.TimeoutException: Timeout waiting for task.
18/10/09 03:22:15 WARN TaskSetManager: Lost task 753.0 in stage 19.0 (TID 1565495, ip, executor 547): ExecutorLostFailure (executor 547 exited caused by one of the running tasks) Reason: Unable to create executor due to Unable to register with external shuffle server due to : java.util.concurrent.TimeoutException: Timeout waiting for task.
18/10/09 03:22:15 ERROR YarnScheduler: Lost executor 572 on ip: Unable to create executor due to Unable to register with external shuffle server due to : java.util.concurrent.TimeoutException: Timeout waiting for task.
18/10/09 03:22:15 INFO DAGScheduler: Executor lost: 547 (epoch 45)
18/10/09 03:22:15 WARN TaskSetManager: Lost task 756.0 in stage 19.0 (TID 1565498, ip, executor 572): ExecutorLostFailure (executor 572 exited caused by one of the running tasks) Reason: Unable to create executor due to Unable to register with external shuffle server due to : java.util.concurrent.TimeoutException: Timeout waiting for task.
...

奇怪的是,我什至看不到日志的“执行者”列表上丢失的执行者。

如果有人可以帮助解决该问题,那将是很好的。

最佳答案

发生这种情况的因素很多,但摘要如下:

您的主节点无法回复特定的执行者,因此出现错误

Unable to register with external shuffle server due to



为什么您的主节点无法回复可能是不同的原因。取决于代码的结构以及使用EMR时实例的大小。

解决它
  • 增加您的主节点。例如,如果您使用的是i3.4xlarge,请改用i3.8xlarge甚至i3.16xlarge。
  • 将网络超时时间从2分钟增加到5分钟。这是通过以下 Spark 配置完成的:spark.network.timeout = 300
  • 增加主节点的内存和核心数。要增加主节点的核心数,请设置以下配置。 spark.yarn.am.cores = 3

  • 希望这能解决问题。

    关于apache-spark - 遗失的遗嘱执行人Spark,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52742728/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com