- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在尝试使用最新的 EMR 加载具有 1TB 数据的数据库以在 AWS 上触发。而且运行时间太长,甚至 6 小时内都没有完成,但是在运行 6h30m 之后,我收到一些错误,宣布 Container 在丢失的节点上发布,然后作业失败。日志是这样的:
16/07/01 22:45:43 WARN scheduler.TaskSetManager: Lost task 144178.0 in stage 0.0 (TID 144178, ip-10-0-2-176.ec2.internal): ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Container marked as failed: container_1467389397754_0001_01_000006 on host: ip-10-0-2-176.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 WARN scheduler.TaskSetManager: Lost task 144181.0 in stage 0.0 (TID 144181, ip-10-0-2-176.ec2.internal): ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Container marked as failed: container_1467389397754_0001_01_000006 on host: ip-10-0-2-176.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 WARN scheduler.TaskSetManager: Lost task 144175.0 in stage 0.0 (TID 144175, ip-10-0-2-176.ec2.internal): ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Container marked as failed: container_1467389397754_0001_01_000006 on host: ip-10-0-2-176.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 WARN scheduler.TaskSetManager: Lost task 144213.0 in stage 0.0 (TID 144213, ip-10-0-2-176.ec2.internal): ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Container marked as failed: container_1467389397754_0001_01_000006 on host: ip-10-0-2-176.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 INFO scheduler.DAGScheduler: Executor lost: 5 (epoch 0)
16/07/01 22:45:43 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1467389397754_0001_01_000007 on host: ip-10-0-2-173.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 5 from BlockManagerMaster.
16/07/01 22:45:43 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(5, ip-10-0-2-176.ec2.internal, 43922)
16/07/01 22:45:43 INFO storage.BlockManagerMaster: Removed 5 successfully in removeExecutor
16/07/01 22:45:43 ERROR cluster.YarnClusterScheduler: Lost executor 6 on ip-10-0-2-173.ec2.internal: Container marked as failed: container_1467389397754_0001_01_000007 on host: ip-10-0-2-173.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 INFO spark.ExecutorAllocationManager: Existing executor 5 has been removed (new total is 41)
16/07/01 22:45:43 WARN scheduler.TaskSetManager: Lost task 144138.0 in stage 0.0 (TID 144138, ip-10-0-2-173.ec2.internal): ExecutorLostFailure (executor 6 exited caused by one of the running tasks) Reason: Container marked as failed: container_1467389397754_0001_01_000007 on host: ip-10-0-2-173.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 WARN scheduler.TaskSetManager: Lost task 144185.0 in stage 0.0 (TID 144185, ip-10-0-2-173.ec2.internal): ExecutorLostFailure (executor 6 exited caused by one of the running tasks) Reason: Container marked as failed: container_1467389397754_0001_01_000007 on host: ip-10-0-2-173.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 WARN scheduler.TaskSetManager: Lost task 144184.0 in stage 0.0 (TID 144184, ip-10-0-2-173.ec2.internal): ExecutorLostFailure (executor 6 exited caused by one of the running tasks) Reason: Container marked as failed: container_1467389397754_0001_01_000007 on host: ip-10-0-2-173.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 WARN scheduler.TaskSetManager: Lost task 144186.0 in stage 0.0 (TID 144186, ip-10-0-2-173.ec2.internal): ExecutorLostFailure (executor 6 exited caused by one of the running tasks) Reason: Container marked as failed: container_1467389397754_0001_01_000007 on host: ip-10-0-2-173.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1467389397754_0001_01_000035 on host: ip-10-0-2-173.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 INFO scheduler.DAGScheduler: Executor lost: 6 (epoch 0)
16/07/01 22:45:43 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 6 from BlockManagerMaster.
16/07/01 22:45:43 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(6, ip-10-0-2-173.ec2.internal, 43593)
16/07/01 22:45:43 INFO storage.BlockManagerMaster: Removed 6 successfully in removeExecutor
16/07/01 22:45:43 ERROR cluster.YarnClusterScheduler: Lost executor 30 on ip-10-0-2-173.ec2.internal: Container marked as failed: container_1467389397754_0001_01_000035 on host: ip-10-0-2-173.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 WARN scheduler.TaskSetManager: Lost task 144162.0 in stage 0.0 (TID 144162, ip-10-0-2-173.ec2.internal): ExecutorLostFailure (executor 30 exited caused by one of the running tasks) Reason: Container marked as failed: container_1467389397754_0001_01_000035 on host: ip-10-0-2-173.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 INFO spark.ExecutorAllocationManager: Existing executor 6 has been removed (new total is 40)
16/07/01 22:45:43 WARN scheduler.TaskSetManager: Lost task 144156.0 in stage 0.0 (TID 144156, ip-10-0-2-173.ec2.internal): ExecutorLostFailure (executor 30 exited caused by one of the running tasks) Reason: Container marked as failed: container_1467389397754_0001_01_000035 on host: ip-10-0-2-173.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 WARN scheduler.TaskSetManager: Lost task 144170.0 in stage 0.0 (TID 144170, ip-10-0-2-173.ec2.internal): ExecutorLostFailure (executor 30 exited caused by one of the running tasks) Reason: Container marked as failed: container_1467389397754_0001_01_000035 on host: ip-10-0-2-173.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 WARN scheduler.TaskSetManager: Lost task 144169.0 in stage 0.0 (TID 144169, ip-10-0-2-173.ec2.internal): ExecutorLostFailure (executor 30 exited caused by one of the running tasks) Reason: Container marked as failed: container_1467389397754_0001_01_000035 on host: ip-10-0-2-173.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
16/07/01 22:45:43 INFO scheduler.DAGScheduler: Executor lost: 30 (epoch 0)
16/07/01 22:45:43 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1467389397754_0001_01_000024 on host: ip-10-0-2-173.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
最佳答案
看起来其他人也有同样的问题,所以我只是发布答案而不是写评论。我不确定这会解决问题,但这应该是一个想法。
如果您使用现货实例,您应该知道如果价格高于您的输入,则现货实例将被关闭,您将遇到此问题。即使您只是将 Spot 实例用作从属实例。所以我的解决方案没有使用任何现货实例来进行长期运行的工作。
另一个想法是将作业分成许多独立的步骤,这样您就可以将每个步骤的结果保存为 S3 上的文件。如果发生任何错误,只需从缓存文件的那一步开始。
关于apache-spark - 在 yarn 模式下 Spark 以 "Exit status: -100. Diagnostics: Container released on a *lost* node"结束,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34102083/
我认为我的问题与“https://serverfault.com/q/299179”和“https://serverfault.com/q/283330/71790”有些相关,但其中任何一个都没有令我
我生成了 APK 对于我的 flutter 项目和 F:\build\app\outputs\apk\release 我有 3 种类型的 apk 文件,包括 output.json 文件。他们是: *
我们最近决定更新 Beta release 的新应用程序在 Google Play 上, 现在读完指南后,我心里有一些问题,想了解更多,我用谷歌搜索进一步了解找到了一些答案,但还有一些我不确定的东西,
我正在尝试使用发布管理作为构建版本的工具,但我很难理解码件、工具和操作之间的真正区别。有人可以分解这三个概念之间的差异以及它们如何相互配合吗? 最佳答案 由于它适用于基于代理的版本: 工具旨在提供自定
我最近完成了使用 jgitflow:release-finish 合并一个发布分支来掌握和开发。 .构建成功。 但是现在我正在尝试使用 jgitflow:releast-start 创建一个新分支.但
我一直在读到,如果一个集合“被释放”,它也会释放它的所有对象。另一方面,我还读到,一旦集合被释放,集合就会释放它的对象。 但最后一件事可能并不总是发生,正如苹果所说。系统决定是否取消分配。在大多数情况
我在具有以下布局的多模块项目上使用 maven-release-plugin: ROOT/ + parent + module1 + module2 在parent的pom中,使用modu
我正在使用 ionic 构建移动应用。 我面临一个严重的问题。 我必须使用 on-touch 和 on-release 事件,但问题是每当我触摸时,on-release 甚至也会立即触发而没有实际释放
谁能解释清楚两者之间的区别是什么.Release()和->Release() 在 CComPtr 上? 确切地说,两种情况下内存管理是如何发生的? 最佳答案 CComPtr 的operator-> 函
两个片段有什么区别? [myObj release]; 和 [myObj release]; myObj = nil; 最佳答案 如果你只是释放一个对象,那么它就会变成释放对象。 如果您尝试对已释放的
我正在运行 maven 发布插件 (org.apache.maven.plugins:maven-release-plugin:2.3.2) 并注意到当通过命令行。我想知道是否有办法关闭它。 我使用
我正在尝试通过运行nuget pack -properties Configuration=Release命令来更新我的nuget软件包,但这会给我以下错误: Unable to find 'bin/
我们正在使用 Microsoft 的发布管理将我们的 Web 应用程序部署到我们的测试环境 (QA)。它是一个直接的 MVC.Net Web 应用程序。我们的构建生成一个 web 部署包,我们有一个命
我有一个在 X 环境中发布的版本 A。另一方面,我有一个在环境 Y 中发布的版本 B。 问题是我想知道我是否可以在版本 B 中检查版本 A 的状态,这样我就可以抛出错误而不发布版本 B。 我不知道是否
我正在开发一个使用大量图像的应用程序,我正在使用 UIWebView 使用 JavaScript 代码(我正在使用 UIZE 库)来表示大约 200 张图像,问题当我完成 UIWebView 时,我在
我已阅读 Marshal.GetIUnknownForObject 的文档它说: Always use Marshal.Release to decrement the reference count
为了成为 iPhone SDK 上的好内存公民,我一直在玩内存。 然而,我仍然很难理解"self.something" 和只是"something" 之间的区别。 据我了解,"self.somethi
我需要使用 bash 找出我正在运行的 Linux 发行版。找到this page ,这非常有帮助。 但是我的系统有两个/etc/*-release 文件 /etc/lsb-release /etc/
我想使用 Maven Release Plugin 将 Release Candidates 发布到我的 Nexus Snapshot 存储库。 将 RC 部署到 Nexus 不是问题,但我想利用 m
在什么情况下我们应该使用“Latch until release”而不是“Switch until release”? 根据 LabVIEW 2011 Help : Latch until relea
我是一名优秀的程序员,十分优秀!