gpt4 book ai didi

scala - 用下一个值填充数据框列中的空值

转载 作者:行者123 更新时间:2023-12-02 04:25:00 25 4
gpt4 key购买 nike

我必须用数据框中同一列的立即值填充第一个空值。此逻辑仅适用于列的第一个连续空值。

我有一个类似于下面的数据框

 //I replaced null to 0 in value column
val df = Seq( (0,"exA",30), (0,"exB",22), (0,"exC",19), (16,"exD",13),
(5,"exE",28), (6,"exF",26), (0,"exG",12), (13,"exH",53))
.toDF("value", "col2", "col3")

scala> df.show(false)
+-----+----+----+
|value|col2|col3|
+-----+----+----+
|0 |exA |30 |
|0 |exB |22 |
|0 |exC |19 |
|16 |exD |13 |
|5 |exE |28 |
|6 |exF |26 |
|0 |exG |12 |
|13 |exH |53 |
+-----+----+----+

从这个数据框我期待如下

scala> df.show(false)
+-----+----+----+
|value|col2|col3|
+-----+----+----+
|16 |exA |30 | // Change the value 0 to 16 at value column
|16 |exB |22 | // Change the value 0 to 16 at value column
|16 |exC |19 | // Change the value 0 to 16 at value column
|16 |exD |13 |
|5 |exE |28 |
|6 |exF |26 |
|0 |exG |12 | // value should not be change here
|13 |exH |53 |
+-----+----+----+

请帮我解决这个问题。

最佳答案

为此您可以使用窗口函数

 val df = Seq( (0,"exA",30), (0,"exB",22), (0,"exC",19), (16,"exD",13),
(5,"exE",28), (6,"exF",26), (0,"exG",12), (13,"exH",53))
.toDF("value", "col2", "col3")
val w = Window.orderBy($"col2".desc)
df.withColumn("Result", last(when($"value" === 0, null).otherwise($"value"), ignoreNulls = true).over(w))
.orderBy($"col2")
.show(10)

将导致

+-----+----+----+------+
|value|col2|col3|Result|
+-----+----+----+------+
| 0| exA| 30| 16|
| 0| exB| 22| 16|
| 0| exC| 19| 16|
| 16| exD| 13| 16|
| 5| exE| 28| 5|
| 6| exF| 26| 6|
| 0| exG| 12| 13|
| 13| exH| 53| 13|
+-----+----+----+------+

表达式 df.orderBy($"col2") 只需要以正确的顺序显示最终结果。如果您不关心最终顺序,可以跳过它。

更新为了准确地获得您需要的东西,您应该使用更复杂的代码

val w = Window.orderBy($"col2")
val w2 = Window.orderBy($"col2".desc)
df.withColumn("IntermediateResult", first(when($"value" === 0, null).otherwise($"value"), ignoreNulls = true).over(w))
.withColumn("Result", when($"IntermediateResult".isNull, last($"IntermediateResult", ignoreNulls = true).over(w2)).otherwise($"value"))
.orderBy($"col2")
.show(10)

+-----+----+----+------------------+------+
|value|col2|col3|IntermediateResult|Result|
+-----+----+----+------------------+------+
| 0| exA| 30| null| 16|
| 0| exB| 22| null| 16|
| 0| exC| 19| null| 16|
| 16| exD| 13| 16| 16|
| 5| exE| 28| 16| 5|
| 6| exF| 26| 16| 6|
| 0| exG| 12| 16| 0|
| 13| exH| 53| 16| 13|
+-----+----+----+------------------+------+

关于scala - 用下一个值填充数据框列中的空值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55374524/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com