gpt4 book ai didi

postgresql - 星火 SQL 2.0 : NullPointerException with a valid PostgreSQL query

转载 作者:行者123 更新时间:2023-11-29 11:38:19 26 4
gpt4 key购买 nike

我有一个有效的 PostgreSQL 查询:当我在 PSQL 中复制/粘贴它时,我得到了想要的结果。
但是,当我使用 Spark SQL 运行时,它会导致 NullPointerException

这是导致错误的代码片段:

extractDataFrame().show()

private def extractDataFrame(): DataFrame = {
val query =
"""(
SELECT events.event_facebook_id, events.name, events.tariffrange,
eventscounts.attending_count, eventscounts.declined_count, eventscounts.interested_count,
eventscounts.noreply_count,
artists.facebookid as artist_facebook_id, artists.likes as artistlikes,
organizers.organizerid, organizers.likes as organizerlikes,
places.placeid, places.capacity, places.likes as placelikes
FROM events
LEFT JOIN eventscounts on eventscounts.event_facebook_id = events.event_facebook_id
LEFT JOIN eventsartists on eventsartists.event_id = events.event_facebook_id
LEFT JOIN artists on eventsartists.artistid = artists.facebookid
LEFT JOIN eventsorganizers on eventsorganizers.event_id = events.event_facebook_id
LEFT JOIN organizers on eventsorganizers.organizerurl = organizers.facebookurl
LEFT JOIN eventsplaces on eventsplaces.event_id = events.event_facebook_id
LEFT JOIN places on eventsplaces.placefacebookurl = places.facebookurl
) df"""

spark.sqlContext.read.jdbc(databaseURL, query, connectionProperties)
}

SparkSession定义如下:

val databaseURL = "jdbc:postgresql://dbHost:5432/ticketapp" 
val spark = SparkSession
.builder
.master("local[*]")
.appName("tariffPrediction")
.getOrCreate()

val connectionProperties = new Properties
connectionProperties.put("user", "simon")
connectionProperties.put("password", "root")

这是完整的堆栈跟踪:

[SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 1 times, most recent failure: Lost task 0.0 in stage 27.0 (TID 27, localhost): java.lang.NullPointerException
at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:]

最令人惊讶的部分是,如果我在 SQL 查询中删除一个(无论哪个)LEFT JOIN 子句,我不会收到任何错误...

最佳答案

我有一个与 Teradata 数据源非常相似的问题,它归结为 DataFrame 上的列可空性与基础数据不匹配(该列具有 nullable=false,但某些行在特定情况下具有空值 field )。在我的案例中,原因是 Teradata JDBC 驱动程序没有返回正确的列元数据。我还没有找到解决方法。

查看正在生成的代码(在其中抛出 NPE):

  • 导入 org.apache.spark.sql.execution.debug._
  • 在 DataSet/DataFrame 上调用 .debugCodegen()

希望这对您有所帮助。

关于postgresql - 星火 SQL 2.0 : NullPointerException with a valid PostgreSQL query,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39875711/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com