gpt4 book ai didi

scala - 如何在 Scala/Spark 中创建示例数据帧

转载 作者:行者123 更新时间:2023-12-04 17:52:37 24 4
gpt4 key购买 nike

我正在尝试创建一个简单的 DataFrame 如下:

import sqlContext.implicits._

val lookup = Array("one", "two", "three", "four", "five")

val theRow = Array("1",Array(1,2,3), Array(0.1,0.4,0.5))

val theRdd = sc.makeRDD(theRow)

case class X(id: String, indices: Array[Integer], weights: Array[Float] )

val df = theRdd.map{
case Array(s0,s1,s2) => X(s0.asInstanceOf[String],s1.asInstanceOf[Array[Integer]],s2.asInstanceOf[Array[Float]])
}.toDF()

df.show()

df 定义为
df: org.apache.spark.sql.DataFrame = [id: string, indices: array<int>, weights: array<float>]

这就是我想要的。

执行后,我得到

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 13.0 failed 1 times, most recent failure: Lost task 1.0 in stage 13.0 (TID 50, localhost): scala.MatchError: 1 (of class java.lang.String)



这个 MatchError 来自哪里?而且,是否有更简单的方法来以编程方式创建示例 DataFrames

最佳答案

首先, theRow 应该是 Row 而不是 Array 。现在,如果您以尊重 Java 和 Scala 之间的兼容性的方式修改您的类型,您的示例将起作用

val theRow =Row("1",Array[java.lang.Integer](1,2,3), Array[Double](0.1,0.4,0.5))
val theRdd = sc.makeRDD(Array(theRow))
case class X(id: String, indices: Array[Integer], weights: Array[Double] )
val df=theRdd.map{
case Row(s0,s1,s2)=>X(s0.asInstanceOf[String],s1.asInstanceOf[Array[Integer]],s2.asInstanceOf[Array[Double]])
}.toDF()
df.show()

//+---+---------+---------------+
//| id| indices| weights|
//+---+---------+---------------+
//| 1|[1, 2, 3]|[0.1, 0.4, 0.5]|
//+---+---------+---------------+

关于scala - 如何在 Scala/Spark 中创建示例数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35383447/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com