gpt4 book ai didi

scala - 如何为 scala Iterable、spark 数据集制作编码器

转载 作者:行者123 更新时间:2023-12-05 04:07:18 25 4
gpt4 key购买 nike

我正在尝试从 RDD y

创建数据集

模式:y: RDD[(MyObj1, scala.Iterable[MyObj2])]

所以我明确地创建了编码器:

implicit def tuple2[A1, A2](
implicit e1: Encoder[A1],
e2: Encoder[A2]
): Encoder[(A1,A2)] = Encoders.tuple[A1,A2](e1, e2)
//Create Dataset
val z = spark.createDataset(y)(tuple2[MyObj1, Iterable[MyObj2]])

当我编译这段代码时,我没有遇到错误,但是当我尝试运行它时,我得到了这个错误:

Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for scala.Iterable[org.bean.input.MyObj2]
- field (class: "scala.collection.Iterable", name: "_2")
- root class: "scala.Tuple2"
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:625)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:619)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:607)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:607)
at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:438)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
at org.apache.spark.sql.Encoders$.product(Encoders.scala:275)
at org.apache.spark.sql.LowPrioritySQLImplicits$class.newProductEncoder(SQLImplicits.scala:233)
at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:33)

对我的对象(MyObj1 和 MyObj2)的一些解释
- MyObj1 :

case class MyObj1(
id:String,
type:String
)

- MyObj2 :

trait MyObj2 {
val o_state:Option[String]

val n_state:Option[String]

val ch_inf: MyObj1

val state_updated:MyObj3
}

请帮忙

最佳答案

Spark 不为Iterables 提供Encoder,所以除非你想使用Encoder.kryoEncoder.java 这行不通。

Spark 为其提供EncodersIterable 最接近的子类是Seq,因此您应该在这里使用它。否则引用 How to store custom objects in Dataset?

关于scala - 如何为 scala Iterable、spark 数据集制作编码器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48825684/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com