gpt4 book ai didi

apache-spark - 如何解决此错误org.apache.spark.sql.catalyst.errors.package $ TreeNodeException

转载 作者:行者123 更新时间:2023-12-04 05:24:52 24 4
gpt4 key购买 nike

我有两个过程,每个过程都要做
1)连接oracle db读取特定表
2)形成数据框并对其进行处理。
3)将df保存到cassandra。

如果我同时运行两个进程,则都尝试从oracle读取
当第二个进程读取数据时,我遇到以下错误

 ERROR ValsProcessor2: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *(1) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#290L])
+- *(1) Scan JDBCRelation((SELECT * FROM BM_VALS WHERE ROWNUM <= 10) T) [numPartitions=2] [] PushedFilters: [], ReadSchema: struct<>
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371)
at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:150)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:294)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2770)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2769)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3254)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3253)
at org.apache.spark.sql.Dataset.count(Dataset.scala:2769)
at com.snp.processors.BenchmarkModelValsProcessor2.process(BenchmarkModelValsProcessor2.scala:43)
at com.snp.utils.Utils$$anonfun$getAllDefinedProcessors$2.apply(Utils.scala:28)
at com.snp.utils.Utils$$anonfun$getAllDefinedProcessors$2.apply(Utils.scala:28)
at com.sp.MigrationDriver$$anonfun$main$2$$anonfun$apply$1.apply(MigrationDriver.scala:78)
at com.sp.MigrationDriver$$anonfun$main$2$$anonfun$apply$1.apply(MigrationDriver.scala:78)
at scala.Option.map(Option.scala:146)
at com.sp.MigrationDriver$$anonfun$main$2.apply(MigrationDriver.scala:75)
at com.sp.MigrationDriver$$anonfun$main$2.apply(MigrationDriver.scala:74)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.MapLike$DefaultKeySet.foreach(MapLike.scala:174)
at com.sp.MigrationDriver$.main(MigrationDriver.scala:74)
at com.sp.MigrationDriver.main(MigrationDriver.scala)
Caused by: java.lang.NullPointerException
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.needToCopyObjectsBeforeShuffle(ShuffleExchangeExec.scala:163)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$.prepareShuffleDependency(ShuffleExchangeExec.scala:300)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:91)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
... 37 more

我在这里做错了什么?如何解决这个问题?

最佳答案

我不确定真正的原因,唯一引起我注意的是SQL表达式:(SELECT * FROM BM_VALS WHERE ROWNUM <= 10) T-T在这里意味着什么?

关于整体设计,我建议使用完全不同的方法。在您的情况下,您有2个处理器处理从Oracle收集的相同数据,并且每个处理器分别获取数据。我建议将读取Oracle数据移到单独的过程中,该过程将返回数据帧(您需要对其进行缓存),然后您的处理器将在该数据帧上工作并将数据持久保存到Cassandra中。

或者按照之前的建议,您可以将作业分为两部分-一项将从Oracle中提取所有数据,然后将数据帧存储到磁盘(不是persist,而是使用write),例如,作为Parquet文件。然后,将要从磁盘中获取数据并执行必要转换的作业分开。

在这两种情况下,您

关于apache-spark - 如何解决此错误org.apache.spark.sql.catalyst.errors.package $ TreeNodeException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53011256/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com