gpt4 book ai didi

scala - 如何将 Spark Dataframe 列转换为字符串数组的单列

转载 作者:行者123 更新时间:2023-12-01 08:51:24 29 4
gpt4 key购买 nike

我想知道如何将多个数据框列“合并”为一个字符串数组?

例如,我有这个数据框:

val df = sqlContext.createDataFrame(Seq((1, "Jack", "125", "Text"), (2,"Mary", "152", "Text2"))).toDF("Id", "Name", "Number", "Comment")

看起来像这样:
scala> df.show
+---+----+------+-------+
| Id|Name|Number|Comment|
+---+----+------+-------+
| 1|Jack| 125| Text|
| 2|Mary| 152| Text2|
+---+----+------+-------+

scala> df.printSchema
root
|-- Id: integer (nullable = false)
|-- Name: string (nullable = true)
|-- Number: string (nullable = true)
|-- Comment: string (nullable = true)

我怎样才能改变它,使它看起来像这样:
scala> df.show
+---+-----------------+
| Id| List|
+---+-----------------+
| 1| [Jack,125,Text]|
| 2| [Mary,152,Text2]|
+---+-----------------+

scala> df.printSchema
root
|-- Id: integer (nullable = false)
|-- List: Array (nullable = true)
| |-- element: string (containsNull = true)

最佳答案

使用 org.apache.spark.sql.functions.array :

import org.apache.spark.sql.functions._
val result = df.select($"Id", array($"Name", $"Number", $"Comment") as "List")

result.show()
// +---+------------------+
// |Id |List |
// +---+------------------+
// |1 |[Jack, 125, Text] |
// |2 |[Mary, 152, Text2]|
// +---+------------------+

关于scala - 如何将 Spark Dataframe 列转换为字符串数组的单列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41021445/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com