gpt4 book ai didi

Scala - Spark In Dataframe 检索具有最大值的行、列名

转载 作者:行者123 更新时间:2023-12-01 09:48:58 25 4
gpt4 key购买 nike

我有一个数据框:

name     column1  column2  column3  column4
first 2 1 2.1 5.4
test 1.5 0.5 0.9 3.7
choose 7 2.9 9.1 2.5

我想要一个包含包含列的新数据框,列名的行具有最大值:

| name   | max_column |
|--------|------------|
| first | column4 |
| test | column4 |
| choose | column3 |

非常感谢您的支持。

最佳答案

可能有更好的方式来编写 UDF。但这可能是可行的解决方案

val spark: SparkSession = SparkSession.builder.master("local").getOrCreate

//implicits for magic functions like .toDf
import spark.implicits._

import org.apache.spark.sql.functions.udf

//We have hard code number of params as UDF don't support variable number of args
val maxval = udf((c1: Double, c2: Double, c3: Double, c4: Double) =>
if(c1 >= c2 && c1 >= c3 && c1 >= c4)
"column1"
else if(c2 >= c1 && c2 >= c3 && c2 >= c4)
"column2"
else if(c3 >= c1 && c3 >= c2 && c3 >= c4)
"column3"
else
"column4"
)

//create schema class
case class Record(name: String,
column1: Double,
column2: Double,
column3: Double,
column4: Double)

val df = Seq(
Record("first", 2.0, 1, 2.1, 5.4),
Record("test", 1.5, 0.5, 0.9, 3.7),
Record("choose", 7, 2.9, 9.1, 2.5)
).toDF();

df.withColumn("max_column", maxval($"column1", $"column2", $"column3", $"column4"))
.select("name", "max_column").show

输出

+------+----------+
| name|max_column|
+------+----------+
| first| column4|
| test| column4|
|choose| column3|
+------+----------+

关于Scala - Spark In Dataframe 检索具有最大值的行、列名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42030486/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com