gpt4 book ai didi

scala - 按元素类触发 rdd 过滤器

转载 作者:行者123 更新时间:2023-12-04 20:31:46 25 4
gpt4 key购买 nike

我有一个包含不同类型元素的 RDD,我想通过它们的类型来计算它们,例如,下面的代码可以正常工作。

scala> val rdd = sc.parallelize(List(1, 2.0, "abc"))
rdd: org.apache.spark.rdd.RDD[Any] = ParallelCollectionRDD[0] at parallelize at <console>:24

scala> rdd.filter{case z:Int => true; case _ => false}.count
res0: Long = 1

scala> rdd.filter{case z:String => true; case _ => false}.count
res1: Long = 1

现在如果元素是用户定义的类型,下面的代码将不会按预期工作。
scala> class TypeA extends Serializable              // this is the base class
defined class TypeA

scala> case class TypeB(id:Long) extends TypeA // derived class 1
defined class TypeB

scala> case class TypeC(name:String) extends TypeA // derived class 2
defined class TypeC

scala> val rdd1 = sc.parallelize(List(TypeB(123), TypeC("jack"), TypeB(456))) // create an rdd with different types of elements
rdd1: org.apache.spark.rdd.RDD[TypeA with Product] = ParallelCollectionRDD[3] at parallelize at <console>:29

scala> rdd1.count // total size is correct
res2: Long = 3

scala> rdd1.filter{case z:TypeB => true; case _ => false}.count // what the hell?
res3: Long = 0

scala> rdd1.filter{case z:TypeC => true; case _ => false}.count // again ?
res4: Long = 0

scala> rdd1.filter{case z:TypeA => true; case _ => false}.count // only works for the base class?
res5: Long = 3

我在这里错过了什么吗?请帮忙!

最佳答案

这看起来像是 Spark-1199 的变体并且很可能是 REPL 错误。

在 IDEA 中本地运行时,这会产生预期的行为:

import org.apache.spark.SparkContext

class TypeA extends Serializable
case class TypeB(id:Long) extends TypeA
case class TypeC(name:String) extends TypeA

val sc = new SparkContext("local[*]", "swe")
val rdd = sc.parallelize(List(TypeB(12), TypeC("Hsa")))

rdd.filter { case x: TypeB => true; case _ => false }.count()

产量:
import org.apache.spark.SparkContext

defined class TypeA
defined class TypeB
defined class TypeC

sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@10a1410d
rdd: org.apache.spark.rdd.RDD[TypeA with Product] = ParallelCollectionRDD[0] at parallelize at <console>:18

[Stage 0:>....... (0 + 0) / 4]
res0: Long = 1

关于scala - 按元素类触发 rdd 过滤器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44473886/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com