gpt4 book ai didi

apache-spark - 如何将列聚合到 JSON 数组中?

转载 作者:行者123 更新时间:2023-12-04 00:32:21 26 4
gpt4 key购买 nike

如何转换如下数据以便将数据存储在 ElasticSearch 中?

这是一个 bean 的数据集,我将按产品将其聚合到 JSON 数组中。

List<Bean> data = new ArrayList<Bean>();
data.add(new Bean("book","John",59));
data.add(new Bean("book","Björn",61));
data.add(new Bean("tv","Roger",36));
Dataset ds = spark.createDataFrame(data, Bean.class);

ds.show(false);

+------+-------+---------+
|amount|product|purchaser|
+------+-------+---------+
|59 |book |John |
|61 |book |Björn |
|36 |tv |Roger |
+------+-------+---------+


ds = ds.groupBy(col("product")).agg(collect_list(map(ds.col("purchaser"),ds.col("amount")).as("map")));
ds.show(false);

+-------+---------------------------------------------+
|product|collect_list(map(purchaser, amount) AS `map`)|
+-------+---------------------------------------------+
|tv |[[Roger -> 36]] |
|book |[[John -> 59], [Björn -> 61]] |
+-------+---------------------------------------------+

这就是我想把它改造成的:

+-------+------------------------------------------------------------------+
|product|json |
+-------+------------------------------------------------------------------+
|tv |[{purchaser: "Roger", amount:36}] |
|book |[{purchaser: "John", amount:36}, {purchaser: "Björn", amount:61}] |
+-------+------------------------------------------------------------------+

最佳答案

解决办法:

ds.groupBy(col("product"))
.agg(collect_list(to_json(struct(col("purchaser"), col("amount"))).alias("json")));

关于apache-spark - 如何将列聚合到 JSON 数组中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49316724/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com