gpt4 book ai didi

group-by - Spark Group通过agg collect_list多列

转载 作者:行者123 更新时间:2023-12-04 12:51:30 25 4
gpt4 key购买 nike

我有一个类似于this的问题,但由collect_list操作的列数由名称列表给出。例如:

scala> w.show
+---+-----+----+-----+
|iid|event|date|place|
+---+-----+----+-----+
| A| D1| T0| P1|
| A| D0| T1| P2|
| B| Y1| T0| P3|
| B| Y2| T2| P3|
| C| H1| T0| P5|
| C| H0| T9| P5|
| B| Y0| T1| P2|
| B| H1| T3| P6|
| D| H1| T2| P4|
+---+-----+----+-----+


scala> val combList = List("event", "date", "place")
combList: List[String] = List(event, date, place)

scala> val v = w.groupBy("iid").agg(collect_list(combList(0)), collect_list(combList(1)), collect_list(combList(2)))
v: org.apache.spark.sql.DataFrame = [iid: string, collect_list(event): array<string> ... 2 more fields]

scala> v.show
+---+-------------------+------------------+-------------------+
|iid|collect_list(event)|collect_list(date)|collect_list(place)|
+---+-------------------+------------------+-------------------+
| B| [Y1, Y2, Y0, H1]| [T0, T2, T1, T3]| [P3, P3, P2, P6]|
| D| [H1]| [T2]| [P4]|
| C| [H1, H0]| [T0, T9]| [P5, P5]|
| A| [D1, D0]| [T0, T1]| [P1, P2]|
+---+-------------------+------------------+-------------------+

有什么方法可以将collect_list应用于agg内的多个列,而无需事先知道combList中的元素数?

最佳答案

您可以使用collect_list(struct(col1,col2))AS元素。

例子:

df.select("cd_issuer", "cd_doc", "cd_item", "nm_item").printSchema
val outputDf = spark.sql(s"SELECT cd_issuer, cd_doc, collect_list(struct(cd_item, nm_item)) AS item FROM teste GROUP BY cd_issuer, cd_doc")
outputDf.printSchema

df
|-- cd_issuer: string (nullable = true)
|-- cd_doc: string (nullable = true)
|-- cd_item: string (nullable = true)
|-- nm_item: string (nullable = true)

outputDf
|-- cd_issuer: string (nullable = true)
|-- cd_doc: string (nullable = true)
|-- item: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- cd_item: string (nullable = true)
| | |-- nm_item: string (nullable = true)

关于group-by - Spark Group通过agg collect_list多列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48759603/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com