gpt4 book ai didi

python - 如何从 avro 架构 (.avsc) 创建表?

转载 作者:太空宇宙 更新时间:2023-11-03 20:48:10 25 4
gpt4 key购买 nike

我有一个 avro 架构文件,我需要通过 pyspark 在 Databricks 中创建一个表。我不需要加载数据,只想创建表。最简单的方法是加载 JSON 字符串并从 fields 数组中获取 "name""type"。然后生成CREATE SQL 查询。我想知道是否有任何编程方式可以使用任何 API 来做到这一点。示例架构 -

{
"type" : "record",
"name" : "kylosample",
"doc" : "Schema generated by Kite",
"fields" : [ {
"name" : "registration_dttm",
"type" : "string",
"doc" : "Type inferred from '2016-02-03T07:55:29Z'"
}, {
"name" : "id",
"type" : "long",
"doc" : "Type inferred from '1'"
}, {
"name" : "first_name",
"type" : "string",
"doc" : "Type inferred from 'Amanda'"
}, {
"name" : "last_name",
"type" : "string",
"doc" : "Type inferred from 'Jordan'"
}, {
"name" : "email",
"type" : "string",
"doc" : "Type inferred from 'ajordan0@com.com'"
}, {
"name" : "gender",
"type" : "string",
"doc" : "Type inferred from 'Female'"
}, {
"name" : "ip_address",
"type" : "string",
"doc" : "Type inferred from '1.197.201.2'"
}, {
"name" : "cc",
"type" : [ "null", "long" ],
"doc" : "Type inferred from '6759521864920116'",
"default" : null
}, {
"name" : "country",
"type" : "string",
"doc" : "Type inferred from 'Indonesia'"
}, {
"name" : "birthdate",
"type" : "string",
"doc" : "Type inferred from '3/8/1971'"
}, {
"name" : "salary",
"type" : [ "null", "double" ],
"doc" : "Type inferred from '49756.53'",
"default" : null
}, {
"name" : "title",
"type" : "string",
"doc" : "Type inferred from 'Internal Auditor'"
}, {
"name" : "comments",
"type" : "string",
"doc" : "Type inferred from '1E+02'"
} ]
}

最佳答案

这似乎还不能通过 Python API 实现...这就是我过去的做法,通过 Spark SQL 创建一个指向导出的 .avsc 的外部表,因为您只想创建一个表并且不加载任何数据...示例:

spark.sql("""
create external table db.table_name
STORED AS AVRO
LOCATION 'PATH/WHERE/DATA/WILL/BE/STORED'
TBLPROPERTIES('avro.schema.url'='PATH/TO/SCHEMA.avsc')
""")

Spark 2.4 中的原生 Scala API 看起来现在可以使用 .avsc 阅读器...因为您使用的是 Databricks,您可以在笔记本中更改内核,例如 %scala 或 %python 或 %sql ... Scala 示例:

import org.apache.avro.Schema

val schema = new Schema.Parser().parse(new File("user.avsc"))

spark
.read
.format("avro")
.option("avroSchema", schema.toString)
.load("/tmp/episodes.avro")
.show()

Spark 2.4 Avro 集成引用文档 =>

https://spark.apache.org/docs/latest/sql-data-sources-avro.html#configuration

https://databricks.com/blog/2018/11/30/apache-avro-as-a-built-in-data-source-in-apache-spark-2-4.html

关于python - 如何从 avro 架构 (.avsc) 创建表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56432911/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com