gpt4 book ai didi

sql - Spark DataFrame中=== null和isNull之间的区别

转载 作者:行者123 更新时间:2023-12-03 13:21:07 24 4
gpt4 key购买 nike

我对使用时的区别感到困惑

 df.filter(col("c1") === null) and df.filter(col("c1").isNull) 

我正在计数的相同数据框
=== null,但isNull中为零。请帮助我了解区别。谢谢

最佳答案

首先,除非出于兼容性原因而必须这样做,否则不要在Scala代码中使用null

关于您的问题,它是纯SQL。 col("c1") === null被解释为c1 = NULL,并且因为NULL标记了未定义的值,所以对于包括NULL本身在内的任何值,结果都是未定义的。

spark.sql("SELECT NULL = NULL").show

+-------------+
|(NULL = NULL)|
+-------------+
| null|
+-------------+

spark.sql("SELECT NULL != NULL").show

+-------------------+
|(NOT (NULL = NULL))|
+-------------------+
| null|
+-------------------+

spark.sql("SELECT TRUE != NULL").show

+------------------------------------+
|(NOT (true = CAST(NULL AS BOOLEAN)))|
+------------------------------------+
| null|
+------------------------------------+

spark.sql("SELECT TRUE = NULL").show

+------------------------------+
|(true = CAST(NULL AS BOOLEAN))|
+------------------------------+
| null|
+------------------------------+

检查 NULL的唯一有效方法是:
  • IS NULL:

    spark.sql("SELECT NULL IS NULL").show

    +--------------+
    |(NULL IS NULL)|
    +--------------+
    | true|
    +--------------+

    spark.sql("SELECT TRUE IS NULL").show

    +--------------+
    |(true IS NULL)|
    +--------------+
    | false|
    +--------------+
  • IS NOT NULL:

    spark.sql("SELECT NULL IS NOT NULL").show

    +------------------+
    |(NULL IS NOT NULL)|
    +------------------+
    | false|
    +------------------+

    spark.sql("SELECT TRUE IS NOT NULL").show

    +------------------+
    |(true IS NOT NULL)|
    +------------------+
    | true|
    +------------------+

  • DataFrame DSL中分别实现为 Column.isNullColumn.isNotNull

    注意:

    对于 NULL-安全比较,请使用 IS DISTINCT / IS NOT DISTINCT:

    spark.sql("SELECT NULL IS NOT DISTINCT FROM NULL").show

    +---------------+
    |(NULL <=> NULL)|
    +---------------+
    | true|
    +---------------+

    spark.sql("SELECT NULL IS NOT DISTINCT FROM TRUE").show

    +--------------------------------+
    |(CAST(NULL AS BOOLEAN) <=> true)|
    +--------------------------------+
    | false|
    +--------------------------------+

    not(_ <=> _) / <=>
    spark.sql("SELECT NULL AS col1, NULL AS col2").select($"col1" <=> $"col2").show

    +---------------+
    |(col1 <=> col2)|
    +---------------+
    | true|
    +---------------+

    spark.sql("SELECT NULL AS col1, TRUE AS col2").select($"col1" <=> $"col2").show

    +---------------+
    |(col1 <=> col2)|
    +---------------+
    | false|
    +---------------+

    分别在SQL和 DataFrame DSL中。

    相关的:

    Including null values in an Apache Spark Join

    关于sql - Spark DataFrame中=== null和isNull之间的区别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41533290/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com