gpt4 book ai didi

scala - 连接许多列的更好方法?

转载 作者:行者123 更新时间:2023-12-01 08:17:36 24 4
gpt4 key购买 nike

我有 30 列。列名中的 26 个是英文字母的名称。我想将这 26 列作为一个字符串放入一列中。

price  dateCreate  volume  country  A  B  C  D  E ..... Z
19 20190501 25 US 1 2 5 6 19 30
49 20190502 30 US 5 4 5 0 34 50

我想要这个:

price  dateCreate  volume  country  new_col
19 20190501 25 US "1,2,5,6,19,....30"
49 20190502 30 US "5,4,5,0,34,50"

我知道我可以做这样的事情:

df.withColumn("new_col", concat($"A", $"B", ...$"Z"))

但是,将来遇到这个问题时,我想知道如何更轻松地连接多个列。有办法吗?

最佳答案

只需将以下内容应用于您要连接的任意数量的列

val df= Seq((19,20190501,24, "US",  1 , 2,  5,  6,  19 ),(49,20190502,30, "US", 5 , 4,  5,  0,  34 )).
toDF("price", "dataCreate", "volume", "country", "A","B","C","D","E")

val exprs = df.columns.drop(4).map(col _)

df.select($"price", $"dataCreate", $"volume", $"country", concat_ws(",",
array(exprs: _*)).as("new_col"))


+-----+----------+------+-------+----------+
|price|dataCreate|volume|country| new_col|
+-----+----------+------+-------+----------+
| 19| 20190501| 24| US|1,2,5,6,19|
| 49| 20190502| 30| US|5,4,5,0,34|
+-----+----------+------+-------+----------+

为了完整起见,这里是 pyspark 等价物

import pyspark.sql.functions as F

df= spark.createDataFrame([[19,20190501,24, "US", 1 , 2, 5, 6, 19 ],[49,20190502,30, "US", 5 , 4, 5, 0, 34 ]],
["price", "dataCreate", "volume", "country", "A","B","C","D","E"])

exprs = [col for col in df.columns[4:]]

df.select("price","dataCreate", "volume", "country", F.concat_ws(",",F.array(*exprs)).alias("new_col"))

关于scala - 连接许多列的更好方法?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57098073/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com