gpt4 book ai didi

scala - 如何避免 Spark (2.4) SQL 中 ArrayType 的自动转换 - Scala 2.11

转载 作者:行者123 更新时间:2023-12-04 17:36:01 24 4
gpt4 key购买 nike

给定 Spark 2.4 和 scala 2.11 中的代码

val df = spark.sql("""select array(45, "something", 45)""")

如果我使用 df.printSchema() 打印架构,我会看到 spark 自动转换为字符串 CAST(45 AS STRING)

root
|-- array(CAST(45 AS STRING), something, CAST(45 AS STRING)): array (nullable = false)
| |-- element: string (containsNull = false)

我想知道是否有办法避免自动转换,而是让 Spark SQL 因异常而失败?假设我在那之后调用任何操作,例如 df.collect()

这只是一个查询示例,但它应该适用于任何查询。

最佳答案

这会在 Dataframe 中创建一个“ArrayType”列。

来自scaladocs : ArrayType 对象包含两个字段,elementType: DataType 和 containsNull: Boolean。 elementType 字段用于指定数组元素的类型。 containsNull 字段用于指定数组是否有空值。

因此 ArrayType 只接受数组中的一种类型的列。如果有不同类型的值传递给 array 函数,它将首先尝试将列转换为字段中最适合的类型。如果列完全不兼容,那么 Spark 将抛出异常。下面的例子

val df = spark.sql("""select array(45, 46L, 45.45)""")
df.printSchema()

root
|-- array(CAST(45 AS DECIMAL(22,2)), CAST(46 AS DECIMAL(22,2)), CAST(45.45 AS DECIMAL(22,2))): array (nullable = false)
| |-- element: decimal(22,2) (containsNull = false)

df: org.apache.spark.sql.DataFrame = [array(CAST(45 AS DECIMAL(22,2)), CAST(46 AS DECIMAL(22,2)), CAST(45.45 AS DECIMAL(22,2))): array<decimal(22,2)>]

下一个,错误:

val df = spark.sql("""select array(45, 46L, True)""")
df.printSchema()

org.apache.spark.sql.AnalysisException: cannot resolve 'array(45, 46L, true)' due to data type mismatch: input to function array should all be the same type, but it's [int, bigint, boolean]; line 1 pos 7;
'Project [unresolvedalias(array(45, 46, true), None)]
+- OneRowRelation

at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:126)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:111)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:77)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:303)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:301)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:301)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:354)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:352)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:301)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:94)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:94)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$3.apply(QueryPlan.scala:106)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$3.apply(QueryPlan.scala:106)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:77)

关于scala - 如何避免 Spark (2.4) SQL 中 ArrayType 的自动转换 - Scala 2.11,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59827003/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com