gpt4 book ai didi

scala - 从行读取列时出现 NullPointerException

转载 作者:行者123 更新时间:2023-12-02 07:55:46 25 4
gpt4 key购买 nike

当值为 null 时,以下用于从 Row 读取值的 Scala (Spark 1.6) 代码会失败,并出现 NullPointerException

val test = row.getAs[Int]("ColumnName").toString

虽然这工作正常

val test1 = row.getAs[Int]("ColumnName") // returns 0 for null
val test2 = test1.toString // converts to String fine

是什么导致了NullPointerException以及处理此类情况的推荐方法是什么?

PS:从 DataFrame 获取行如下:

val myRDD = myDF.repartition(partitions)
.mapPartitions{ rows =>
rows.flatMap{ row =>
functionWithRows(row) //has above logic to read null column which fails
}
}

functionWithRows 然后就有上面提到的 NullPointerException

MyDF 架构:

root
|-- LDID: string (nullable = true)
|-- KTAG: string (nullable = true)
|-- ColumnName: integer (nullable = true)

最佳答案

getAs 定义为:

def getAs[T](i: Int): T = get(i).asInstanceOf[T]

当我们执行 toString 时,我们调用 Object.toString ,它不依赖于类型,因此 asInstanceOf[T] 会被编译器丢弃,即

row.getAs[Int](0).toString -> row.get(0).toString

我们可以通过编写简单的 scala 代码来确认这一点:

import org.apache.spark.sql._

object Test {
val row = Row(null)
row.getAs[Int](0).toString
}

然后编译它:

$ scalac -classpath $SPARK_HOME/jars/'*' -print test.scala
[[syntax trees at end of cleanup]] // test.scala
package <empty> {
object Test extends Object {
private[this] val row: org.apache.spark.sql.Row = _;
<stable> <accessor> def row(): org.apache.spark.sql.Row = Test.this.row;
def <init>(): Test.type = {
Test.super.<init>();
Test.this.row = org.apache.spark.sql.Row.apply(scala.this.Predef.genericWrapArray(Array[Object]{null}));
Test.this.row().getAs(0).toString();
()
}
}
}

所以正确的方法是:

String.valueOf(row.getAs[Int](0))

关于scala - 从行读取列时出现 NullPointerException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47882574/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com