gpt4 book ai didi

apache-spark - 如果在 avro 模式中添加新列,则 Spark sql saveAsTable 创建表追加模式

转载 作者:行者123 更新时间:2023-12-05 06:36:36 24 4
gpt4 key购买 nike

我正在使用 Spark sql DataSet 将数据写入配置单元。如果架构相同,它会完美运行,但如果我更改 avro 架构,在两者之间添加新列,它会显示错误(架构由架构注册表提供)

Error running job streaming job 1519289340000 ms.0
org.apache.spark.sql.AnalysisException: The column number of the existing table default.sample(struct<collection_timestamp:bigint,managed_object_id:string,managed_object_type:string,if_admin_status:string,date:string,hour:int,quarter:bigint>) doesn't match the data schema(struct<collection_timestamp:bigint,managed_object_id:string,if_oper_status:string,managed_object_type:string,if_admin_status:string,date:string,hour:int,quarter:bigint>);

if_oper_status是必须添加新列。请提出建议。

StructType struct = convertSchemaToStructType(SchemaRegstryClient.getLatestSchema("simple"));
Dataset<Row> dataset = getSparkInstance().createDataFrame(newRDD, struct);


dataset=dataset.withColumn("date",functions.date_format(functions.current_date(), "dd-MM-yyyy"));
dataset=dataset.withColumn("hour",functions.hour(functions.current_timestamp()));
dataset=dataset.withColumn("quarter",functions.floor(functions.minute(functions.current_timestamp()).divide(5)));


dataset
.coalesce(1)
.write().mode(SaveMode.Append)
.option("charset", "UTF8")
.partitionBy("date","hour","quarter")
.option("checkpointLocation", "/tmp/checkpoint")
.saveAsTable("sample");

最佳答案

我能够通过将架构从注册表保存到文件并提供如下 avro.schema.url = 文件路径来解决此问题。

注意:这必须在 saveAsTable("sample")

之前完成
dataset.sqlContext().sql("CREATE EXTERNAL TABLE IF NOT EXISTS sample PARTITIONED BY (dt STRING, hour STRING, quarter STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'hdfs://localhost:9000/user/root/sample'  TBLPROPERTIES ('avro.schema.url'='file://"+file.getAbsolutePath()+"')");

关于apache-spark - 如果在 avro 模式中添加新列,则 Spark sql saveAsTable 创建表追加模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48923603/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com