gpt4 book ai didi

scala - Apache Spark - 如何在 3 之后定义 UserDefinedAggregateFunction?

转载 作者:行者123 更新时间:2023-12-02 02:18:27 24 4
gpt4 key购买 nike

我正在使用 Spark 3.0,为了将用户定义的函数用作窗口函数,我需要 UserDefinedAggregateFunction 的实例。最初我认为使用新的 Aggregatorudaf 会解决这个问题(如图 here ),但是 udaf 返回一个 UserDefinedFunction,不是 UserDefinedAggregateFunction

自 Spark 3.0 起,UserDefinedAggregateFunction 已弃用,如 here 所述(尽管仍然可以 find it around )

所以问题是:在 Spark 3.0 中是否有正确的(未弃用的)方法来定义适当的 UserDefinedAggregateFunction 并将其用作窗口函数?

最佳答案

在 Spark 3 中,新 API 使用 Aggregator定义用户定义的聚合:

abstract class Aggregator[-IN, BUF, OUT] extends Serializable:

A base class for user-defined aggregations, which can be used inDataset operations to take all of the elements of a group and reducethem to a single value.

与弃用的 UDAF 相比,聚合器带来了性能改进。可以看到issue Efficient User Defined Aggregators .

下面是一个关于如何定义均值聚合器并使用 functions.udaf 方法注册它的示例:

import org.apache.spark.sql.Encoder
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.expressions.Aggregator

val meanAgg= new Aggregator[Long, (Long, Long), Double]() {

def zero = (0L, 0L) // Init the buffer

def reduce(y: (Long, Long), x: Long) = (y._1 + x, y._2 + 1)

def merge(a: (Long, Long), b: (Long, Long)) = (a._1 + b._1, a._2 + b._2)

def finish(r: (Long, Long)) = r._1.toDouble / r._2

def bufferEncoder: Encoder[(Long, Long)] = implicitly(ExpressionEncoder[(Long, Long)])

def outputEncoder: Encoder[Double] = implicitly(ExpressionEncoder[Double])
}

val meanUdaf = udaf(meanAgg)

与窗口一起使用:

val df = Seq(
(1, 2), (1, 5),
(2, 3), (2, 1),
).toDF("id", "value")

df.withColumn("mean", meanUdaf($"value").over(Window.partitionBy($"id"))).show
//+---+-----+----+
//| id|value|mean|
//+---+-----+----+
//| 1| 2| 3.5|
//| 1| 5| 3.5|
//| 2| 3| 2.0|
//| 2| 1| 2.0|
//+---+-----+----+

关于scala - Apache Spark - 如何在 3 之后定义 UserDefinedAggregateFunction?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66808917/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com