gpt4 book ai didi

scala - 如何根据条件(组中的值)更新列?

转载 作者:行者123 更新时间:2023-12-03 22:06:26 24 4
gpt4 key购买 nike

我有以下 df:

+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| fn| red|
| 2| fn| blue|
| 3| fn|green|
+---+----+-----+

如果任何颜色列值是 red ,那么我颜色列的所有值都应该更新为 red , 如下:
+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| fn| red|
| 2| fn| red|
| 3| fn| red|
+---+----+-----+

我想不通。请帮忙;我试过以下代码:
val gp=jdbcDF.filter($"dept".contains("fn"))
//.withColumn("newone",when($"dept"==="fn","RED").otherwise("NULL"))
gp.show()
gp.map(
row=>{
val row1=row.getAs[String](1)
var row2=row.getAs[String](2)
val make=if(row1 =="fn") row2="red"
Row(row(0),row(1),make)
}
).collect().foreach(println)

最佳答案

鉴于:

val df = Seq(
(1, "fn", "red"),
(2, "fn", "blue"),
(3, "fn", "green"),
(4, "aa", "blue"),
(5, "aa", "green"),
(6, "bb", "red"),
(7, "bb", "red"),
(8, "aa", "blue")
).toDF("id", "fn", "color")

进行计算:
val redOrNot = df.groupBy("fn")
.agg(collect_set('color) as "values")
.withColumn("hasRed", array_contains('values, "red"))

// gives null for no option
val colorPicker = when('hasRed, "red")
val result = df.join(redOrNot, "fn")
.withColumn("resultColor", colorPicker)
.withColumn("color", coalesce('resultColor, 'color)) // skips nulls that leads to the answer
.select('id, 'fn, 'color)
result看起来如下(这似乎是一个答案):
scala> result.show
+---+---+-----+
| id| fn|color|
+---+---+-----+
| 1| fn| red|
| 2| fn| red|
| 3| fn| red|
| 4| aa| blue|
| 5| aa|green|
| 6| bb| red|
| 7| bb| red|
| 8| aa| blue|
+---+---+-----+

您可以链接 when运算符,并有一个默认值 otherwise .咨询 scaladoc of when operator .

我认为您可以使用窗口运算符或用户定义的聚合函数 (UDAF) 来做一些非常相似的事情(也许更有效),但是……嗯……目前不知道如何去做。在这里留下评论以激励他人;-)

附言学到了很多!谢谢你的主意!

关于scala - 如何根据条件(组中的值)更新列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40692025/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com