gpt4 book ai didi

scala - 为什么过滤不存在​​(未选择)的列会起作用?

转载 作者:行者123 更新时间:2023-12-04 10:48:18 25 4
gpt4 key购买 nike

以下最小示例

val df1 = spark.createDataFrame(Seq((0, "a"), (1, "b"))).toDF("foo", "bar")
val df2 = df1.select($"foo")
val df3 = df2.filter($"bar" === lit("a"))

df1.printSchema
df1.show

df2.printSchema
df2.show

df3.printSchema
df3.show

运行无错误:
root
|-- foo: integer (nullable = false)
|-- bar: string (nullable = true)

+---+---+
|foo|bar|
+---+---+
| 0| a|
| 1| b|
+---+---+

root
|-- foo: integer (nullable = false)

+---+
|foo|
+---+
| 0|
| 1|
+---+

root
|-- foo: integer (nullable = false)

+---+
|foo|
+---+
| 0|
+---+

但是,我期望类似
org.apache.spark.sql.AnalysisException: cannot resolve '`bar`' given input columns: [foo];

出于同样的原因,我得到
org.apache.spark.sql.AnalysisException: cannot resolve '`asdasd`' given input columns: [foo];

当我做
val df4 = df2.filter($"asdasd" === lit("a"))

但它不会发生。为什么?

最佳答案

我倾向于称其为错误。安 explain plan多说一点:

val df1 = Seq((0, "a"), (1, "b")).toDF("foo", "bar")

df1.select("foo").where($"bar" === "a").explain(true)
// == Parsed Logical Plan ==
// 'Filter ('bar = a)
// +- Project [foo#4]
// +- Project [_1#0 AS foo#4, _2#1 AS bar#5]
// +- LocalRelation [_1#0, _2#1]
//
// == Analyzed Logical Plan ==
// foo: int
// Project [foo#4]
// +- Filter (bar#5 = a)
// +- Project [foo#4, bar#5]
// +- Project [_1#0 AS foo#4, _2#1 AS bar#5]
// +- LocalRelation [_1#0, _2#1]
//
// == Optimized Logical Plan ==
// LocalRelation [foo#4]
//
// == Physical Plan ==
// LocalTableScan [foo#4]

显然, parsed logical plananalyzed (or resolved) logical plan仍然由 bar 组成在他们的 Project nodes (即 projections )并且过滤操作继续遵守假定已删除的列。

在相关说明中,以下查询的逻辑计划也包含删除的列,因此表现出类似的异常:
df1.drop("bar").where($"bar" === "a")

关于scala - 为什么过滤不存在​​(未选择)的列会起作用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59597678/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com