gpt4 book ai didi

java - Spark : not understood behaviour when writing to parquet file - datatypes

转载 作者:塔克拉玛干 更新时间:2023-11-01 22:59:30 28 4
gpt4 key购买 nike

我有这样的 csv 记录:

--------------------------- 
name | age | entranceDate |
---------------------------
Tom | 12 | 2019-10-01 |
---------------------------
Mary | 15 | 2019-10-01 |

我使用自定义模式从 CSV 读取它并将其转换为 DataFrame:

public static StructType createSchema() {
final StructType schema = DataTypes.createStructType(Arrays.asList(
DataTypes.createStructField("name", DataTypes.StringType, false),
DataTypes.createStructField("age", DataTypes.StringType, false),
DataTypes.createStructField("entranceDate", DataTypes.StringType, false)
));
return schema;
}


sqlContext.read()
.format("com.databricks.spark.csv")
.option("inferSchema", "false")
.option("delimiter", FIELD_DELIMITER)
.option("header", "false")
.schema(schema)
.load(pathToMyCsvFile);

现在我想将此数据框写入我的 hdfs 上的 Parquet :

String[] partitions =
new String[] {
"name",
"entranceDate"
};

df.write()
.partitionBy(partitions)
.mode(SaveMode.Append)
.parquet(parquetPath);

但是当我在 spark-shell 中检查 Parquet 的架构时:

sqlContext.read.parquet("/test/parquet/name=Tom/entranceDate=2019-10-01/").printSchema()

它显示 entranceDateDate 类型。我不知道那是怎么回事?我已经指定这个字段应该是String,它如何自动转换为Date

----------------

编辑 :我做了一些测试,发现只有当我在编写时执行 .partitionBy(partitions) ,它才会转换为日期。如果我删除此行并打印模式,它将显示 entranceDate 的类型是 String

最佳答案

我会说这是因为自动架构推理机制。 Spark 文档 page

Notice that the data types of the partitioning columns are automatically inferred. Currently, numeric data types, date, timestamp and string type are supported.

Sometimes users may not want to automatically infer the data types of the partitioning columns. For these use cases, the automatic type inference can be configured by spark.sql.sources.partitionColumnTypeInference.enabled, which is default to true.

关于java - Spark : not understood behaviour when writing to parquet file - datatypes,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58064097/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com