gpt4 book ai didi

json - 如何解析具有嵌套模式的 json?

转载 作者:行者123 更新时间:2023-12-05 04:00:44 28 4
gpt4 key购买 nike

让我的 json 的模式是:

       root
|-- data: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: string (containsNull = true)

JSON是这样的

{  "data": [    [      10429183,      "4057F5BE-1933-415E-9AF7-D3CAAC5ED8E6",      10429183,      1454527245,      "386824",      1454527245,      "386824",      null,      "6702002",      "HM193685",      "2006-02-21T21:00:00",      "078XX S VERNON AVE",      "2092",      "NARCOTICS",      "SOLICIT NARCOTICS ON PUBLICWAY",      "STREET",      true,      false,      "0624",      "006",      "6",      "69",      "26",      null,      null,      "2006",      "2015-08-17T15:03:40",      null,      null,      [        null,        null,        null,        null,        null      ]    ]  ]}
val df2 = 
df1
.withColumn("data", explode(array(jsonElements: _*)))
.withColumn("id", $"data" (0)).select("data.*")

错误:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Can only star expand struct data types. Attribute: ArrayBuffer(data);

需要为每个数据元素创建一个数据框吗?

最佳答案

据我了解,您正在尝试将数组中的每个 json 元素拆分为单独的列...

一种方式如下

import org.apache.spark.sql._

object JsonTest extends App {
val jsonStr =
"""
|{
| "data": [
| [
| 10429183,
| "4057F5BE-1933-415E-9AF7-D3CAAC5ED8E6",
| 10429183,
| 1454527245,
| "386824",
| 1454527245,
| "386824",
| null,
| "6702002",
| "HM193685",
| "2006-02-21T21:00:00",
| "078XX S VERNON AVE",
| "2092",
| "NARCOTICS",
| "SOLICIT NARCOTICS ON PUBLICWAY",
| "STREET",
| true,
| false,
| "0624",
| "006",
| "6",
| "69",
| "26",
| null,
| null,
| "2006",
| "2015-08-17T15:03:40",
| null,
| null,
| [
| null,
| null,
| null,
| null,
| null
| ]
| ]
| ]
|}
""".stripMargin
private[this] implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()

spark.sparkContext.setLogLevel("ERROR")

import org.apache.spark.sql.functions._
import spark.implicits._

val df1 = spark.read.json(Seq(jsonStr).toDS)
println("before explode")
df1.show(false)
println(df1.schema)
println("after explode")
// import org.apache.spark.sql.functions.schema_of_json
// val schema = df1.select(schema_of_json($"data")).as[String].first
// df1.withColumn("jsonData", from_json($"data", schema, Map[String, String]())).show
val df2 = df1
.withColumn("data", explode(col("data")))
println(df2.schema)
df2.show(false)

val nElements = 35
df2.select(Range(0, nElements).map(idx => $"data" (idx) as "data" + (idx + 2)): _*).show(false)

}

结果:

before explode+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+|data                                                                                                                                                                                                                                                                                                                    |+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+|[[10429183, 4057F5BE-1933-415E-9AF7-D3CAAC5ED8E6, 10429183, 1454527245, 386824, 1454527245, 386824,, 6702002, HM193685, 2006-02-21T21:00:00, 078XX S VERNON AVE, 2092, NARCOTICS, SOLICIT NARCOTICS ON PUBLICWAY, STREET, true, false, 0624, 006, 6, 69, 26,,, 2006, 2015-08-17T15:03:40,,, [null,null,null,null,null]]]|+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+StructType(StructField(data,ArrayType(ArrayType(StringType,true),true),true))after explodeStructType(StructField(data,ArrayType(StringType,true),true))+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+|data                                                                                                                                                                                                                                                                                                                  |+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+|[10429183, 4057F5BE-1933-415E-9AF7-D3CAAC5ED8E6, 10429183, 1454527245, 386824, 1454527245, 386824,, 6702002, HM193685, 2006-02-21T21:00:00, 078XX S VERNON AVE, 2092, NARCOTICS, SOLICIT NARCOTICS ON PUBLICWAY, STREET, true, false, 0624, 006, 6, 69, 26,,, 2006, 2015-08-17T15:03:40,,, [null,null,null,null,null]]||data2   |data3                               |data4   |data5     |data6 |data7     |data8 |data9|data10 |data11  |data12             |data13            |data14|data15   |data16                        |data17|data18|data19|data20|data21|data22|data23|data24|data25|data26|data27|data28             |data29|data30|data31                    |data32|data33|data34|data35|data36|+--------+------------------------------------+--------+----------+------+----------+------+-----+-------+--------+-------------------+------------------+------+---------+------------------------------+------+------+------+------+------+------+------+------+------+------+------+-------------------+------+------+--------------------------+------+------+------+------+------+|10429183|4057F5BE-1933-415E-9AF7-D3CAAC5ED8E6|10429183|1454527245|386824|1454527245|386824|null |6702002|HM193685|2006-02-21T21:00:00|078XX S VERNON AVE|2092  |NARCOTICS|SOLICIT NARCOTICS ON PUBLICWAY|STREET|true  |false |0624  |006   |6     |69    |26    |null  |null  |2006  |2015-08-17T15:03:40|null  |null  |[null,null,null,null,null]|null  |null  |null  |null  |null  |+--------+------------------------------------+--------+----------+------+----------+------+-----+-------+--------+-------------------+------------------+------+---------+------------------------------+------+------+------+------+------+------+------+------+------+------+------+-------------------+------+------+--------------------------+------+------+------+------+------+

您可以使用 withColumn 更改列名,也可以删除不需要的列..

关于json - 如何解析具有嵌套模式的 json?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55974244/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com