gpt4 book ai didi

scala - 无法让 Spark 聚合器正常工作

转载 作者:行者123 更新时间:2023-12-01 12:14:17 25 4
gpt4 key购买 nike

我想在 Scala Spark 中尝试聚合器,但我似乎无法让它们同时使用 select函数和 groupBy/agg函数(在我当前的实现中,agg 函数无法编译)。我的聚合器写在下面,应该是不言自明的。

import org.apache.spark.sql.expressions.Aggregator
import org.apache.spark.sql.{Encoder, Encoders}

/** Stores the number of true counts (tc) and false counts (fc) */
case class Counts(var tc: Long, var fc: Long)

/** Count the number of true and false occurances of a function */
class BooleanCounter[A](f: A => Boolean) extends Aggregator[A, Counts, Counts] with Serializable {
// Initialize both counts to zero
def zero: Counts = Counts(0L, 0L)
// Sum counts for intermediate value and new value
def reduce(acc: Counts, other: A): Counts = {
if (f(other)) acc.tc += 1 else acc.fc += 1
acc
}
// Sum counts for intermediate values
def merge(acc1: Counts, acc2: Counts): Counts = {
acc1.tc += acc2.tc
acc1.fc += acc2.fc
acc1
}
// Return results
def finish(acc: Counts): Counts = acc
// Encoder for intermediate value type
def bufferEncoder: Encoder[Counts] = Encoders.product[Counts]
// Encoder for return type
def outputEncoder: Encoder[Counts] = Encoders.product[Counts]
}

下面是我的测试代码。
val ds: Dataset[Employee] = Seq(
Employee("John", 110),
Employee("Paul", 100),
Employee("George", 0),
Employee("Ringo", 80)
).toDS()

val salaryCounter = new BooleanCounter[Employee]((r: Employee) => r.salary < 10).toColumn
// Usage works fine
ds.select(salaryCounter).show()
// Causes an error
ds.groupBy($"name").agg(salaryCounter).show()
salaryCounter的第一次使用工作正常,但第二个导致以下编译错误。
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to Employee 

Databricks 有一个 tutorial这相当复杂,但似乎是 Spark 2.3。还有 this使用 Spark 1.6 中的实验性功能的旧教程。

最佳答案

您错误地混合了“静态类型”和“动态类型”API。要使用以前的版本,您应该调用 aggKeyValueGroupedDataset ,不是 RelationalGroupedDataset :

ds.groupByKey(_.name).agg(salaryCounter)

关于scala - 无法让 Spark 聚合器正常工作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49440766/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com