gpt4 book ai didi

scala - Spark Dataframe - 如何访问 json 结构

转载 作者:可可西里 更新时间:2023-11-01 16:23:11 26 4
gpt4 key购买 nike

我有一个像这样的 json 文件:

{
"employeeDetails":{
"name": "xxxx",
"num":"415"
},
"work":[
{
"monthYear":"01/2007",
"workdate":"1|2|3|....|31",
"workhours":"8|8|8....|8"
},
{
"monthYear":"02/2007",
"workdate":"1|2|3|....|31",
"workhours":"8|8|8....|8"
}
]
}

我必须从这个 json 数据中获取工作日期和工作时间。

我正在使用 Spark 2.1.1

我试过这样的:

     val spark = SparkSession.builder().appName("SQL-JSON").master("local[4]").getOrCreate()

val df = spark.read.json(spark.sparkContext.wholeTextFiles("sample22.json").values)
// df.show()
// df.printSchema()

//val gatewayMessageContent = df.select("employeeDetails")
//gatewayMessageContent.printSchema()
val sensorMessagesContent = df.select("work")
sensorMessagesContent.printSchema()

// I am fallowing one article online, it showing like this, but it not working for me.
val flattened = df.select( $"root", explode($"work").as("work_flat"))

我遇到这样的异常:

Error:(22, 31) value $ is not a member of StringContext
val flattened = df.select($"root", explode($"work").as("work_flat"))
^
Error:(22, 48) value $ is not a member of StringContext
val flattened = df.select($"root", explode($"work").as("work_flat"))
^

在那个例子中,他展示的是顶层的“名称”。但我的情况是我没有任何顶级元素(“工作”)。因此它不起作用。

我是 Spark 的新手。

最佳答案

你应该使用spark的withColumn函数作为

val flattened = df.withColumn("workDate", struct($"work.workdate"))
.withColumn("workHours", struct($"work.workhours"))
flattened.show(false)

你应该有以下输出

+---------------+--------------------------------------------------------------------------+--------------------------------------------+----------------------------------------+
|employeeDetails|work |workDate |workHours |
+---------------+--------------------------------------------------------------------------+--------------------------------------------+----------------------------------------+
|[xxxx,415] |[[01/2007,1|2|3|....|31,8|8|8....|8], [02/2007,1|2|3|....|31,8|8|8....|8]]|[WrappedArray(1|2|3|....|31, 1|2|3|....|31)]|[WrappedArray(8|8|8....|8, 8|8|8....|8)]|
+---------------+--------------------------------------------------------------------------+--------------------------------------------+----------------------------------------+

我假设您已经有一个架构为

的数据框
root
|-- work: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- monthYear: string (nullable = true)
| | |-- workdate: string (nullable = true)
| | |-- workhours: string (nullable = true)

关于scala - Spark Dataframe - 如何访问 json 结构,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44825021/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com