gpt4 book ai didi

json - Scala Spark - 从简单数据帧创建嵌套 json 输出

转载 作者:行者123 更新时间:2023-12-02 01:26:14 25 4
gpt4 key购买 nike

谢谢你回来。但我面临的问题是将这些结构写入嵌套 json 时。不知何故,“tojson”不起作用,只是跳过嵌套字段,从而始终形成平面结构。如何将嵌套json格式写入HDFS?

最佳答案

您应该从必须嵌套在一起的字段创建结构字段。下面是一个工作示例:假设您有 csv 格式的员工数据,其中包含公司名称、员工和部门名称,并且您希望以 json 格式列出每个公司每个部门的所有员工。下面是相同的代码。

  import java.util.List;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.api.java.UDF2;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;

import scala.collection.mutable.WrappedArray;
public class JsonExample {
public static void main(String [] args)
{
SparkSession sparkSession = SparkSession
.builder()
.appName("JsonExample")
.master("local")
.getOrCreate();

//read the csv file
Dataset<Row> employees = sparkSession.read().option("header", "true").csv("/tmp/data/emp.csv");
//create the temp view
employees.createOrReplaceTempView("employees");

//First , group the employees based on company AND department
sparkSession.sql("select company,department,collect_list(name) as department_employees from employees group by company,department").createOrReplaceTempView("employees");
/*Now create a struct by invoking the UDF create_struct.
* The struct will contain department and the list of employees
*/
sparkSession.sql("select company,collect_list(struct(department,department_employees)) as department_info from employees group by company").toJSON().show(false);



}
}

您可以在我的博客上找到相同的示例: http://baahu.in/spark-how-to-generate-nested-json-using-dataset/

关于json - Scala Spark - 从简单数据帧创建嵌套 json 输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38188115/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com