gpt4 book ai didi

apache-spark - Spark 将列组合为嵌套数组

转载 作者:行者123 更新时间:2023-12-04 05:05:58 26 4
gpt4 key购买 nike

如何将 spark 中的列组合为嵌套数组?

val inputSmall = Seq(
("A", 0.3, "B", 0.25),
("A", 0.3, "g", 0.4),
("d", 0.0, "f", 0.1),
("d", 0.0, "d", 0.7),
("A", 0.3, "d", 0.7),
("d", 0.0, "g", 0.4),
("c", 0.2, "B", 0.25)).toDF("column1", "transformedCol1", "column2", "transformedCol2")

类似于
+-------+---------------+---------------+------- +
|column1|transformedCol1|transformedCol2|combined|
+-------+---------------+---------------+------ -+
| A| 0.3| 0.3[0.3, 0.3]|
+-------+---------------+---------------+-------+

最佳答案

如果想将多列组合成一个新的ArrayType列,可以使用 array 函数:

import org.apache.spark.sql.functions._
val result = inputSmall.withColumn("combined", array($"transformedCol1", $"transformedCol2"))
result.show()

+-------+---------------+-------+---------------+-----------+
|column1|transformedCol1|column2|transformedCol2| combined|
+-------+---------------+-------+---------------+-----------+
| A| 0.3| B| 0.25|[0.3, 0.25]|
| A| 0.3| g| 0.4| [0.3, 0.4]|
| d| 0.0| f| 0.1| [0.0, 0.1]|
| d| 0.0| d| 0.7| [0.0, 0.7]|
| A| 0.3| d| 0.7| [0.3, 0.7]|
| d| 0.0| g| 0.4| [0.0, 0.4]|
| c| 0.2| B| 0.25|[0.2, 0.25]|
+-------+---------------+-------+---------------+-----------+

关于apache-spark - Spark 将列组合为嵌套数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41239887/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com