gpt4 book ai didi

scala - 值 toDF 不是 org.apache.spark.rdd.RDD 的成员

转载 作者:行者123 更新时间:2023-12-04 23:17:43 25 4
gpt4 key购买 nike

我已经在其他 SO 帖子中读到过这个问题,但我仍然不知道我做错了什么。原则上,添加这两行:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

应该已经完成​​了,但错误仍然存​​在

这是我的 build.sbt:
name := "PickACustomer"

version := "1.0"

scalaVersion := "2.11.7"


libraryDependencies ++= Seq("com.databricks" %% "spark-avro" % "2.0.1",
"org.apache.spark" %% "spark-sql" % "1.6.0",
"org.apache.spark" %% "spark-core" % "1.6.0")

我的 Scala 代码是:
import scala.collection.mutable.Map
import scala.collection.immutable.Vector

import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._


object Foo{

def reshuffle_rdd(rawText: RDD[String]): RDD[Map[String, (Vector[(Double, Double, String)], Map[String, Double])]] = {...}

def do_prediction(shuffled:RDD[Map[String, (Vector[(Double, Double, String)], Map[String, Double])]], prediction:(Vector[(Double, Double, String)] => Map[String, Double]) ) : RDD[Map[String, Double]] = {...}

def get_match_rate_from_results(results : RDD[Map[String, Double]]) : Map[String, Double] = {...}


def retrieve_duid(element: Map[String,(Vector[(Double, Double, String)], Map[String,Double])]): Double = {...}




def main(args: Array[String]){
val conf = new SparkConf().setAppName(this.getClass.getSimpleName)
if (!conf.getOption("spark.master").isDefined) conf.setMaster("local")

val sc = new SparkContext(conf)

//This should do the trick
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

val PATH_FILE = "/mnt/fast_export_file_clean.csv"
val rawText = sc.textFile(PATH_FILE)
val shuffled = reshuffle_rdd(rawText)

// PREDICT AS A FUNCTION OF THE LAST SEEN UID
val results = do_prediction(shuffled.filter(x => retrieve_duid(x) > 1) , predict_as_last_uid)
results.cache()

case class Summary(ismatch: Double, t_to_last:Double, nflips:Double,d_uid: Double, truth:Double, guess:Double)

val summary = results.map(x => Summary(x("match"), x("t_to_last"), x("nflips"), x("d_uid"), x("truth"), x("guess")))


//PROBLEMATIC LINE
val sum_df = summary.toDF()

}
}

我总是得到:

value toDF is not a member of org.apache.spark.rdd.RDD[Summary]



现在有点失落。有任何想法吗?

最佳答案

将案例类移到 main 之外:

object Foo {

case class Summary(ismatch: Double, t_to_last:Double, nflips:Double,d_uid: Double, truth:Double, guess:Double)

def main(args: Array[String]){
...
}

}

关于它的范围的一些事情阻止了 Spark 能够处理 Summary 模式的自动派生.仅供引用,我实际上收到了与 sbt 不同的错误:

No TypeTag available for Summary

关于scala - 值 toDF 不是 org.apache.spark.rdd.RDD 的成员,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36055774/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com