python - PySpark:StructField(..., ..., False) 总是返回 `nullable=true` 而不是 `nullable=false`-6ren

python - PySpark:StructField(..., ..., False) 总是返回 `nullable=true` 而不是 `nullable=false`

转载作者：太空狗更新时间：2023-10-29 21:00:28

31

4

我是 PySpark 的新手，正面临一个奇怪的问题。我试图在加载 CSV 数据集时将某些列设置为不可空。我可以使用非常小的数据集 (test.csv) 重现我的案例:

col1,col2,col3
11,12,13
21,22,23
31,32,33
41,42,43
51,,53

第 5 行第 2 列有一个空值，我不想在我的 DF 中获取该行。我将所有字段设置为不可为空 (nullable=false)，但我得到了一个架构，其中所有三列都具有 nullable=true。即使我将所有三列都设置为不可为空，也会发生这种情况!我正在运行最新可用的 Spark 版本 2.0.1。

代码如下:

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.some.config.option", "some-value") \
    .getOrCreate()

struct = StructType([   StructField("col1", StringType(), False), \
                        StructField("col2", StringType(), False), \
                        StructField("col3", StringType(), False) \
                    ])

df = spark.read.load("test.csv", schema=struct, format="csv", header="true")

df.printSchema() 返回:

root
 |-- col1: string (nullable = true)
 |-- col2: string (nullable = true)
 |-- col3: string (nullable = true)

和df.show() 返回:

+----+----+----+
|col1|col2|col3|
+----+----+----+
|  11|  12|  13|
|  21|  22|  23|
|  31|  32|  33|
|  41|  42|  43|
|  51|null|  53|
+----+----+----+

虽然我期望这样:

root
 |-- col1: string (nullable = false)
 |-- col2: string (nullable = false)
 |-- col3: string (nullable = false)

+----+----+----+
|col1|col2|col3|
+----+----+----+
|  11|  12|  13|
|  21|  22|  23|
|  31|  32|  33|
|  41|  42|  43|
+----+----+----+

最佳答案

虽然这里的 Spark 行为(从 False 切换到 True 令人困惑，但这里并没有根本性的错误。nullable 参数不是约束，而是源和类型语义的反射(reflect)，可以实现某些类型的优化

您声明要避免数据中出现空值。为此，您应该使用 na.drop 方法。

df.na.drop()

有关处理空值的其他方法，请查看 DataFrameNaFunctions (使用 DataFrame.na 属性公开)文档。

CSV 格式不提供任何允许您指定数据约束的工具，因此根据定义，读者不能假设输入不为空并且您的数据确实包含空值。

关于python - PySpark:StructField(..., ..., False) 总是返回 `nullable=true` 而不是 `nullable=false`，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39917075/

31

4

0

文章推荐： python - 我怎样才能解决 y = (x+1)**3 -2 for x in sympy？

文章推荐： c++ - 需要关于在我的项目中共享恒定值的建议

文章推荐： c++ - GNU LD 符号版本控制和 C++ 二进制向后兼容性

文章推荐： Python:从非 BMP unicode 字符中查找等效代理对

c# - C# 中的 Nullable 或 Optional
因为对这个问题的第一条评论可能是为什么以及为什么可能使这不是一个重复的问题:我有一个具有三个有效状态的值类型变量(如果重要的话是小数类型):has value | null | unspecified
c# - 为什么我不能写 Nullable>？
Nullable 的定义是: [SerializableAttribute] public struct Nullable where T : struct, new() 约束where T : st
d - 为什么 Nullable!(Nullable!int) 拒绝编译？
以下代码拒绝编译: Nullable!(Nullable!int) nni = Nullable!(Nullable!int)(10); 出现此错误消息: Error: inout method nu
c#-3.0 - Not nullable 到 Nullable 类型转换的基础知识
Not Nullable 类型转换为 Nullable 类型的基础知识是什么？ CLR 内部发生了什么？值类型是否在内部转换为引用类型？ int i = 100; and int ? i = 7?
c# - 如何将 nullable int 转换为 nullable short？
我在寻找答案时得到的结果是指从 short 到 int 以及从可空到不可空的转换。但是，我无法理解如何将“较大”类型 int? 转换为“较小”类型 short?。我能想到的唯一方法就是写一个这样的方
c# - Swagger 客户端将 none nullable 更改为 nullable
我们在契约(Contract)中使用 Swagger。考虑这个简单的响应 DTO public class Result { public int SomeInt { get;set; } }
c# - 为什么会有 Nullable 结构和 Nullable 类？
有一个Nullable结构，还有另一个静态 Nullable具有三个静态方法的类。我的问题是，为什么static Nullable中的这些静态方法不能？类进入Nullable结构？将它们定义为两种不
C# 是 nullable decimal 的 nullable int 后代
我错误地发现了一些让我吃惊的东西。我有这个方法 public static string PrintDecimal(decimal? input, string NumberFormat = nul
c# - 为什么可为空的 bool 值不允许 if(nullable) 但允许 if(nullable == true)？
此代码编译: private static void Main(string[] args) { bool? fred = true; if (fred == true)
Sybase/ALTER 表 : how to turn an existing column from non-nullable to nullable?
希望标题已经很清楚了。我想看一个更改表语句的示例，该语句可以将 Sybase 表中现有的不可为空的列更改为可以为空。最佳答案 Modifying the NULL default value of
C# 泛型类 : infer non-nullable type from nullable type parameter
我使用 C# 8 可空引用类型。我有一个泛型类，它可能接受可为空引用类型作为类型参数。有没有办法根据泛型类型参数声明不可为空的类型，这些参数可能是可为空的引用类型(甚至是 Nullable 结构)
c# - 为什么 Nullable 的空传播返回 T 而不是 Nullable？
考虑以下代码: Nullable dt; dt. dt?. . 如何以及为什么？最佳答案因为如果 ?. 左侧的对象为 null，则 null 传播的工作方式永远不会执行右侧的对象。因为您知道右
c# - Nullable.HasValue 或 Nullable != null 之间有什么区别？
我一直使用 Nullable<>.HasValue因为我喜欢语义。然而，最近我正在研究其他人现有的代码库，他们在其中使用了 Nullable<> != null。专门代替。是否有理由使用一个而不是另
c# - "Convert.ToString(Nullable)"和 "Nullable.ToString()"之间的区别？
我对转换方法“.ToString()”有一个普遍的疑问。起初我使用这个语句进行转换: Nullable SomeProperty; string test = SomeProperty.ToStrin
javascript - 流: Coercing an object-with-nullable-property to object-with-non-nullable-property?
有没有一种方法可以表达两种相关类型(一种具有可空属性，一种不具有)，以便您可以在运行时进行检查后将一种强制转换为另一种？例如 - type Stat = { count: ?number, }
c# - Entity Framework 代码生成策略——Nullable varchar 与 Non-Nullable varchar
对于 EF 5.0.0、VS 2012 和 .NET 4.5，当我从现有 SQL Server 2012 数据库添加新的 ADO.NET 实体数据模型时，生成的代码不会区分可空和不可空 varchar
python - PySpark:StructField(..., ..., False) 总是返回 `nullable=true` 而不是 `nullable=false`
我是 PySpark 的新手，正面临一个奇怪的问题。我试图在加载 CSV 数据集时将某些列设置为不可空。我可以使用非常小的数据集 (test.csv) 重现我的案例: col1,col2,col3 1
c# - 了解 C# 泛型和 Nullable 值类型。返回 null 或 nullable
假设我有以下类(class): public class GenericClass { public T Find() { //return T if found ot
c# - 为什么 c# null 可以隐式转换为 System.Nullable，但不能转换为自定义的 Nullable
这个问题在这里已经有了答案: How is the boxing/unboxing behavior of Nullable possible? (3 个答案) 关闭 7 年前。为什么 null可
ios - Firebase、Swift : Conflicting nullability specifier on return types, 'nullable' 与现有说明符 'nonnull' 冲突
Firebase 3.6.0 中的警告。 Xcode 8 - Swift 3。这些是 Firebase 类:- @class FIROptions @class FIRAuthCredential

首页

博学

6Ren·AI

商城

python - PySpark:StructField(..., ..., False) 总是返回 `nullable=true` 而不是 `nullable=false`