scala - Spark 数据集 : Example : Unable to generate an encoder issue-6ren

scala - Spark 数据集 : Example : Unable to generate an encoder issue

转载作者：行者123 更新时间：2023-12-04 11:02:17

新来的 Spark 世界并尝试我在网上找到的用 Scala 编写的数据集示例

通过 SBT 运行它时，我不断收到以下错误
org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class
知道我在忽略什么吗

也可以随意指出编写相同数据集示例的更好方法

谢谢

> sbt>  runMain DatasetExample

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/10/25 01:06:39 INFO Remoting: Starting remoting
16/10/25 01:06:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.150.130:50555]
[error] (run-main-6) org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class `DatasetExample$Student` without access to the scope that this class was defined in. Try moving this class out of its parent class.;
org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class `DatasetExample$Student` without access to the scope that this class was defined in. Try moving this class out of its parent class.;
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$3.applyOrElse(ExpressionEncoder.scala:306)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$$anonfun$3.applyOrElse(ExpressionEncoder.scala:302)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:249)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolve(ExpressionEncoder.scala:302)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:79)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:90)
at org.apache.spark.sql.DataFrame.as(DataFrame.scala:209)
at DatasetExample$.main(DatasetExample.scala:45)
at DatasetExample.main(DatasetExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
[trace] Stack trace suppressed: run last sparkExamples/compile:runMain for the full output.
java.lang.RuntimeException: Nonzero exit code: 1
at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last sparkExamples/compile:runMain for the full output.
[error] (sparkExamples/compile:runMain) Nonzero exit code: 1
[error] Total time: 127 s, completed Oct 25, 2016 1:08:09 AM

代码 :

import org.apache.spark._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql._
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.functions._

object DatasetExample  {
   // Create data sets 
   case class Student(name: String, dept: String, age:Long )
   case class Department(abbrevName: String, fullName: String)

   org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this) // Not sure what exactly is the purpose

   def main(args: Array[String]) {
      Logger.getLogger("org").setLevel(Level.OFF)
      Logger.getLogger("akka").setLevel(Level.OFF)
      // initialise spark context
      val conf = new SparkConf().setAppName("SetsExamples").setMaster("local")
      val sc = new SparkContext(conf)
      val sqlcontext = new org.apache.spark.sql.SQLContext(sc)

      import sqlcontext.implicits._   // Not sure what exactly is the purpose

      // Read JSON objects into a Dataset[Student].
      val students = sqlcontext.read.json("student.json").as[Student]
      students.show()

      // Select two columns and filter on one column.
      // Each argument of "select" must be a "TypedColumn".
      students.select($"name".as[String], $"dept".as[String]).
                   filter(_._2 == "Math").  // Filter on _2, the second selected column
                   collect()

      // Group by department and count each group.
      students.groupBy(_.dept).count().collect()

      // Group and aggregate in each group.
      students.groupBy(_.dept).
                  agg(avg($"age").as[Double]).
                  collect()

      // Initialize a Seq and convert to a Dataset.
      val depts = Seq(Department("CS", "Computer Science"), Department("Math", "Mathematics")).toDS()

      // Show the contents of the Dataset.
      depts.show()

      // Join two datasets with "joinWith".
      val joined = students.joinWith(depts, $"dept" === $"abbrevName")

      // Show the contents of the joined Dataset.
      // Note that the original objects are nested into tuples under the _1 and _2 columns.
      joined.show()

      // terminate spark context
      sc.stop()

      }        
}

JSON 文件 ( student.json) :

{"id" : "1201", "name" : "Kris", "age" : "25"}
{"id" : "1202", "name" : "John", "age" : "28"}
{"id" : "1203", "name" : "Chet", "age" : "39"}
{"id" : "1204", "name" : "Mark", "age" : "23"}
{"id" : "1205", "name" : "Vic", "age" : "23"}

最佳答案

这一行是导致问题的原因:

org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this)

这意味着您正在向此上下文添加一个新的外部作用域，可在实例化 inner class 时使用该作用域。在反序列化过程中。

内部类是在 Spark REPL 中定义案例类时创建的，并且注册定义此类的外部作用域允许我们在 Spark 执行器上创建新实例。

在正常使用中(您的情况)，您不需要调用此函数。

编辑:您还需要将案例类移到 DatasetExample 之外。目的。

注:
import sqlContext.implicits._是对可用于将常见 scala RDD 对象转换为 DataFrame 的隐式方法的特定于 scala 的调用。

更多相关信息 here .

关于scala - Spark 数据集 : Example : Unable to generate an encoder issue，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40229953/

文章推荐： sql - 如何在 Oracle 中执行 countIf()

文章推荐： rCharts - 将第二个 y 轴添加到时间序列

文章推荐： scala - Scala 中的 Future[Either[AppError, Option[User]]]

文章推荐： r - 在不知道行号的情况下删除特定行

android - ADB 连接错误 : Unable to create Debug Bridge: Unable to start adb server: Unable to detect adb version, adb 输出
这个问题在这里已经有了答案: android studio adb Syntax error: ")" unexpected (4 个答案) 关闭 5 年前。我确实将我的工作室更新到 Linux
Android Studio 显示对话框 "unable to create debug bridge : unable to start adb server: unable to obtain result of ' adb version'"
当我打开 Android Studio 时，它会显示下面的对话框，我的应用程序无法以 Debug模式运行。最佳答案找到sdk->platform-tool文件夹，然后运行命令adb tcpip 5
docker - VSCode "Unable to open Unable to read file"
我在 Fedora 33 机器上使用了 VSCode (1.52.1) 一段时间。我正在使用 Docker 扩展 (v1.9.0) 但由于某种原因，在过去几天中，当我尝试将 VSCode 附加到 D
python-3.x - 操作系统错误 : Unable to open file (unable to open file)
我正在尝试为我的任务加载一个名为“tr_model.h5”的预训练模型，但出现以下错误: Traceback (most recent call last): File "Trigger_Proje
java.lang.RuntimeException : Unable to instantiate activity (unable to open DEX file)
我最近在 Google Play 中发布了一个 Android 应用程序，到目前为止一切看起来都很棒，但有一个异常(exception)(见下文)，我有时会收到控制台崩溃的消息，它只发生在某些用户身上
c++ - 点击错误时的 vscode "Unable to open [file]: Unable to read file"消息
单击警告时，我收到此消息。这是案件的截图。错误写道， Unable to open 'warning.cpp': Unable to read file '/Users/dimen/code/C++
重新编译 php 后的警告 (Unable to load dynamic library & Unable to initialize module)
重新编译 php 后，当我使用 php cli 时出现以下错误: PHP Warning: PHP Startup: imap: Unable to initialize module Module
git pull 失败 "unable to resolve reference" "unable to update local ref"
使用 git 1.6.4.2，当我尝试 git pull 时出现此错误: error: unable to resolve reference refs/remotes/origin/LT558-op
android - 多个异常 : Unable to start activity, Unable to instantiate fragment, Error inflating class
我是 Kotlin 的新手，我正在学习教程。运行我的应用程序会导致它在运行时崩溃。我在底部导航栏应用程序中有三个 fragment 和一个主要 Activity 。我的目标实际上只是成功运行该应用程序
android-studio - 无法创建调试桥: Unable to start adb server: Unable to obtain result of 'adb version'
我在 Windows 10 64 位上运行 Android Studio。我在 5 月 20 日早上升级到了最新的版本和 SDK，从那时起，我在打开 Android Studio 时收到上述错误。我还
docker - Github 操作 : unable to prepare context: unable to evaluate symlinks in Dockerfile path: lstat
因此，我创建了一个 GitHub 操作，该操作应该在发生推送时构建 docker 镜像并将其推送到 docker hub。所以这是我的 GitHub 操作:(第一次创建 GitHub 操作) name
android - 初始化 ADB 错误 : Unable to create Debug Bridge: Unable to start ADB server
当这些问题中的任何一个都没有帮助我时，这意味着我需要删除并重新安装 Android Studio。这是错误: Error initializing ADB: Unable to create Deb
Flutter: XCode error "Unable to boot the Simulator"(颤动：Xcode错误“Unable to Boot the Simulator(无法启动模拟器)”)
Iam running the latest OSX/Flutter/XCode Versions using flutter, android studio and firebase and
ubuntu apt : how to fix this error? ... "Unable to open temporary cache: Unable to open temporary cache file"
我最近重新安装了 ubuntu 20.04.3，在做了一些设置后，我现在在运行 apt update 时总是收到以下错误.我一定是不小心删除了某种缓存文件或目录，但我不知道如何诊断或解决这个问题。任
ssl - Gitlab 致命 : unable to access '.......' SSL Certificate problem: unable to get issuer certificate
今天，当我将更改推送到 Gitlab 中的代码存储库时，我才开始在 Gitlab 中收到此错误消息。 Gitlab fatal: unable to access '.......' SSL Cert
selenium-webdriver - "unable to connect to renderer": Unable to resize or maximize the browser window using protractor with chrome beta
我正在使用 Protractor 和 Jasmine 。我已经确定了 chrome 驱动程序版本:2.32.498550 (latest)与 Chrome 测试版不兼容 (Version 62.0.
docker - 为什么在使用多个 Docker 主机时出现 Unable to create container with image Unable to pull image 错误 pulling image？
我正在尝试使用 terraform (0.12.24) 和多个 Docker 提供程序(插件版本 2.7.0)执行简单部署。我使用下面的 Terraform 模板的目的是将两个不同的容器部署到两个不同
asp.net - Delphi 2007 for .NET 中的错误 "Unable to start debugging...Unable to attach to ASP.NET worker process"消息
这个问题我看了很久，最后决定在这里提出来。我需要维护一些用 Delphi 2007 for .NET (ASP.NET 2.0) 编写的应用程序。通常，第一次运行应用程序(使用 IIS)时，我会收到经
android - 无法创建调试桥 : Unable to start adb server: Unable to detect adb version, adb 输出:/sdk/platform-tools/adb:
如何修复 - 无法创建调试桥:无法启动 adb 服务器:无法检测 adb 版本，adb 输出:/home/dilip/Downloads/sdk/platform-tools/adb: 1:/home
android - SQLite 错误 : unable to open database "/data/data/PackageName/databases/SampleDB.db": unable to open database file
通过命令提示符连接到 android 中的 Sqlite DB 时出现错误。以下是我遵循的步骤: 我已经在 android 中通过 java 程序创建了 Sqlite 数据库。创建表并向其中插入数据

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

scala - Spark 数据集 : Example : Unable to generate an encoder issue