gpt4 book ai didi

java.lang.OutOfMemoryError Spark DAG 具有更多阶段

转载 作者:太空宇宙 更新时间:2023-11-04 11:10:54 25 4
gpt4 key购买 nike

我有一个运行 sql 连接的 Spark 作业。

我可视化了 DAG,它每次连接都会创建 +5 个阶段。无论如何,在 DAG 大约有 40 个阶段的阶段之后,下一步总是会失败,但有异常(exception),即经过 8 次迭代,每次迭代 5 个阶段。

Exception in thread "main" java.lang.OutOfMemoryError at java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161) at java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) at java.lang.StringBuilder.append(StringBuilder.java:136) at java.lang.StringBuilder.append(StringBuilder.java:131) at scala.StringContext.standardInterpolator(StringContext.scala:125) at scala.StringContext.s(StringContext.scala:95) at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:230) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:54) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2392) at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2420) at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2419) at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801) at org.apache.spark.sql.Dataset.count(Dataset.scala:2419) at com.samsung.cloud.mopay.linking.controller.PostNotificLinkController.linkPostNotific(PostNotificLinkController.java:51) at com.samsung.cloud.mopay.linking.txn.TxnLinking.performTxnLinking(TxnLinking.java:26) at com.samsung.cloud.mopay.linking.Linking.processData(Linking.java:199) at com.samsung.cloud.mopay.linking.Linking.main(Linking.java:72) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我正在运行 Spark

--conf spark.driver.memory=30g 
--conf spark.yarn.driver.memoryOverhead=15g
--conf spark.executor.memory=10g
--conf spark.yarn.executor.memoryOverhead=6g
--conf spark.executor.cores=5

每个节点 3 个实例 (r3.2xlarge) = >12 个执行程序实例

最佳答案

解决方案是在几个阶段后保留数据。这将启动一个新的执行计划,并且不会添加到现有计划中,使其变得越来越大并耗尽内存。

关于java.lang.OutOfMemoryError Spark DAG 具有更多阶段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46009011/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com