gpt4 book ai didi

scala - org.apache.spark.SparkException : Failed to execute user defined function

转载 作者:行者123 更新时间:2023-12-01 13:31:37 24 4
gpt4 key购买 nike

我是 Scala 的新手,我正在尝试执行以下代码:

val SetID = udf{(c:String, d: String) =>
if( c.UpperCase.contains("EXKLUS") == true)
{d}
else {""}
}
val ParquetWithID = STG1
.withColumn("ID", SetID( col("line_item"), col("line_item_ID")))

两列( line_itemline_item_id )都定义为 StringsSTG1架构。

当我尝试运行代码时出现以下错误:
`org.apache.spark.SparkException: Failed to execute user defined function($anonfun$1$$anonfun$2: (string, string) => string)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


Caused by: java.lang.NullPointerException
at MyTests$$anonfun$1$$anonfun$2.apply(MyTests.scala:356)
at MyTests$$anonfun$1$$anonfun$2.apply(MyTests.scala:355)
... 16 more

我也试过 c.UpperCase().contains("EXKLUS")但我遇到了同样的错误。
但是,如果我只是运行“ if equals”语句,一切正常。所以我猜问题在于使用 UpperCase().contains(" ")我的 udf 中的函数但我不明白问题出在哪里。任何帮助都会受到赞赏!

最佳答案

如果schema包含作为

 |-- line_item: string (nullable = true)
|-- line_item_ID: string (nullable = true)

然后检查 null在你的 if 语句中应该解决这个问题(注意字符串有 toUpperCase 方法)
val SetID = udf{(c:String, d: String) =>
if(c != null && c.toUpperCase.contains("EXKLUS") == true)
{d}
else {""}
}
val ParquetWithID = STG1
.withColumn("ID", SetID( col("line_item"), col("line_item_ID")))

我希望答案有帮助

关于scala - org.apache.spark.SparkException : Failed to execute user defined function,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45739168/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com