gpt4 book ai didi

dataframe - 可空字段在写入 Spark Dataframe 时发生更改

转载 作者:行者123 更新时间:2023-12-05 01:45:53 26 4
gpt4 key购买 nike

以下代码从 parquet 文件读取 Spark DataFrame 并写入另一个 parquet 文件。在将 DataFrame 写入新的 Parquet 文件后,ArrayType DataType 中的 Nullable 字段发生更改。

代码:

    SparkConf sparkConf = new SparkConf();
String master = "local[2]";
sparkConf.setMaster(master);
sparkConf.setAppName("Local Spark Test");
JavaSparkContext sparkContext = new JavaSparkContext(new SparkContext(sparkConf));
SQLContext sqc = new SQLContext(sparkContext);
DataFrame dataFrame = sqc.read().parquet("src/test/resources/users.parquet");
StructField[] fields = dataFrame.schema().fields();
System.out.println(fields[2].dataType());
dataFrame.write().mode(SaveMode.Overwrite).parquet("src/test/resources/users1.parquet");


DataFrame dataFrame1 = sqc.read().parquet("src/test/resources/users1.parquet");
StructField [] fields1 = dataFrame1.schema().fields();
System.out.println(fields1[2].dataType());

输出:

ArrayType(IntegerType,false)
ArrayType(IntegerType,true)

Spark 版本为:1.6.2

最佳答案

对于 Spark 2.4 或之前的版本,所有从 spark sql 写入的列都是可以为空的。引用 the official guide

Parquet is a columnar format that is supported by many other data processing systems. Spark SQL provides support for both reading and writing Parquet files that automatically preserves the schema of the original data. When writing Parquet files, all columns are automatically converted to be nullable for compatibility reasons.

关于dataframe - 可空字段在写入 Spark Dataframe 时发生更改,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39697193/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com