gpt4 book ai didi

scala - 将 Scala 代码部署到 Spark 时 ClassNotFoundException anonfun

转载 作者:行者123 更新时间:2023-12-04 09:46:17 26 4
gpt4 key购买 nike

我是 Apache Spark 的新手,我正在尝试将一段简单的 Scala 代码部署到 Spark。

注:我正在尝试连接到一个现有的正在运行的集群,我通过我的 java 参数将其配置为:spark.master=spark://MyHostName:7077
环境

  • Spark 1.5.1 使用 Scala 构建 2.10
  • Spark 运行 独立 我本地机器上的模式
  • 操作系统 : Mac OS El 船长
  • JVM :JDK 1.8.0_60
  • IDE : IntelliJ IDEA 社区 14.1.5
  • 斯卡拉 版本:2.10.4

  • sbt : 0.13.8

    代码
    import org.apache.spark.{SparkConf, SparkContext}
    object HelloSpark {
    def main(args: Array[String]) {
    val logFile = "/README.md"
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    println("%s done!".format(numAs))
    }
    }

    build.sbt
    name := "data-streamer210"

    version := "1.0"

    scalaVersion := "2.10.4"

    libraryDependencies ++= Seq(
    "org.apache.spark" % "spark-core_2.10" % "1.5.1",
    "org.apache.spark" % "spark-streaming_2.10" % "1.5.1",
    "org.apache.spark" % "spark-mllib_2.10" % "1.5.1",
    "org.apache.spark" % "spark-bagel_2.10" % "1.5.1",
    "org.apache.spark" % "spark-streaming-twitter_2.10" % "1.5.1"
    )

    错误
    15/10/19 19:40:09 INFO SparkContext: Starting job: count at HelloSpark.scala:14
    15/10/19 19:40:09 INFO DAGScheduler: Got job 0 (count at HelloSpark.scala:14) with 2 output partitions
    15/10/19 19:40:09 INFO DAGScheduler: Final stage: ResultStage 0(count at HelloSpark.scala:14)
    15/10/19 19:40:09 INFO DAGScheduler: Parents of final stage: List()
    15/10/19 19:40:09 INFO DAGScheduler: Missing parents: List()
    15/10/19 19:40:09 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at filter at HelloSpark.scala:14), which has no missing parents
    15/10/19 19:40:09 INFO MemoryStore: ensureFreeSpace(3192) called with curMem=120313, maxMem=2061647216
    15/10/19 19:40:09 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.1 KB, free 1966.0 MB)
    15/10/19 19:40:09 INFO MemoryStore: ensureFreeSpace(1892) called with curMem=123505, maxMem=2061647216
    15/10/19 19:40:09 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1892.0 B, free 1966.0 MB)
    15/10/19 19:40:09 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 127.0.0.1:50941 (size: 1892.0 B, free: 1966.1 MB)
    15/10/19 19:40:09 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:861
    15/10/19 19:40:09 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at filter at HelloSpark.scala:14)
    15/10/19 19:40:09 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
    15/10/19 19:40:10 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@127.0.0.1:50951/user/Executor#-147774947]) with ID 0
    15/10/19 19:40:10 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 127.0.0.1, PROCESS_LOCAL, 2160 bytes)
    15/10/19 19:40:10 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 127.0.0.1, PROCESS_LOCAL, 2160 bytes)
    15/10/19 19:40:10 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@127.0.0.1:50952/user/Executor#1450479604]) with ID 2
    15/10/19 19:40:10 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@127.0.0.1:50957/user/Executor#1447408721]) with ID 1
    15/10/19 19:40:10 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@127.0.0.1:50955/user/Executor#1397136754]) with ID 3
    15/10/19 19:40:10 INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:50963 with 530.0 MB RAM, BlockManagerId(0, 127.0.0.1, 50963)
    15/10/19 19:40:10 INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:50964 with 530.0 MB RAM, BlockManagerId(2, 127.0.0.1, 50964)
    15/10/19 19:40:10 INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:50965 with 530.0 MB RAM, BlockManagerId(1, 127.0.0.1, 50965)
    15/10/19 19:40:10 INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:50966 with 530.0 MB RAM, BlockManagerId(3, 127.0.0.1, 50966)
    15/10/19 19:40:11 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 127.0.0.1:50963 (size: 1892.0 B, free: 530.0 MB)
    15/10/19 19:40:11 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 127.0.0.1): java.lang.ClassNotFoundException: HelloSpark$$anonfun$1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

    15/10/19 19:40:11 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor 127.0.0.1: java.lang.ClassNotFoundException (HelloSpark$$anonfun$1) [duplicate 1]
    15/10/19 19:40:11 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, 127.0.0.1, PROCESS_LOCAL, 2160 bytes)
    15/10/19 19:40:11 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, 127.0.0.1, PROCESS_LOCAL, 2160 bytes)
    15/10/19 19:40:11 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 127.0.0.1:50966 (size: 1892.0 B, free: 530.0 MB)
    15/10/19 19:40:11 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 127.0.0.1:50964 (size: 1892.0 B, free: 530.0 MB)
    15/10/19 19:40:11 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3) on executor 127.0.0.1: java.lang.ClassNotFoundException (HelloSpark$$anonfun$1) [duplicate 2]
    15/10/19 19:40:11 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 4, 127.0.0.1, PROCESS_LOCAL, 2160 bytes)
    15/10/19 19:40:11 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 4) on executor 127.0.0.1: java.lang.ClassNotFoundException (HelloSpark$$anonfun$1) [duplicate 3]
    15/10/19 19:40:11 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on executor 127.0.0.1: java.lang.ClassNotFoundException (HelloSpark$$anonfun$1) [duplicate 4]
    15/10/19 19:40:11 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 5, 127.0.0.1, PROCESS_LOCAL, 2160 bytes)
    15/10/19 19:40:11 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 6, 127.0.0.1, PROCESS_LOCAL, 2160 bytes)
    15/10/19 19:40:11 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 5) on executor 127.0.0.1: java.lang.ClassNotFoundException (HelloSpark$$anonfun$1) [duplicate 5]
    15/10/19 19:40:11 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 7, 127.0.0.1, PROCESS_LOCAL, 2160 bytes)
    15/10/19 19:40:11 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 7) on executor 127.0.0.1: java.lang.ClassNotFoundException (HelloSpark$$anonfun$1) [duplicate 6]
    15/10/19 19:40:11 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
    15/10/19 19:40:11 INFO TaskSchedulerImpl: Cancelling stage 0
    15/10/19 19:40:11 INFO TaskSchedulerImpl: Stage 0 was cancelled
    15/10/19 19:40:11 INFO DAGScheduler: ResultStage 0 (count at HelloSpark.scala:14) failed in 2.613 s
    15/10/19 19:40:11 INFO DAGScheduler: Job 0 failed: count at HelloSpark.scala:14, took 2.716305 s
    Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, 127.0.0.1): java.lang.ClassNotFoundException: HelloSpark$$anonfun$1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

    Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1835)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1848)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1919)
    at org.apache.spark.rdd.RDD.count(RDD.scala:1121)
    at HelloSpark$.main(HelloSpark.scala:14)
    at HelloSpark.main(HelloSpark.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
    Caused by: java.lang.ClassNotFoundException: HelloSpark$$anonfun$1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    15/10/19 19:40:11 INFO SparkContext: Invoking stop() from shutdown hook
    15/10/19 19:40:11 WARN TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6, 127.0.0.1): org.apache.spark.TaskKilledException
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:204)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

    15/10/19 19:40:11 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
    15/10/19 19:40:11 INFO SparkUI: Stopped Spark web UI at http://127.0.0.1:4040
    15/10/19 19:40:11 INFO DAGScheduler: Stopping DAGScheduler
    15/10/19 19:40:11 INFO SparkDeploySchedulerBackend: Shutting down all executors
    15/10/19 19:40:11 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
    15/10/19 19:40:11 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    15/10/19 19:40:11 INFO MemoryStore: MemoryStore cleared
    15/10/19 19:40:11 INFO BlockManager: BlockManager stopped
    15/10/19 19:40:11 INFO BlockManagerMaster: BlockManagerMaster stopped
    15/10/19 19:40:11 INFO SparkContext: Successfully stopped SparkContext
    15/10/19 19:40:11 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    15/10/19 19:40:11 INFO ShutdownHookManager: Shutdown hook called
    15/10/19 19:40:11 INFO ShutdownHookManager: Deleting directory /private/var/folders/q9/m_d81ms107n09tj8k5wbzfb40000gp/T/spark-53ce9474-5488-4d50-bfb6-c58ddeed7640

    Process finished with exit code 1

    最佳答案

    当您从 IntelliJ 运行 Spark 时,您可以连接到“本地” Spark JVM 或远程集群。

    如果您将 master 设置为本地(例如, setMaster("local[*]") ),那么您在本地范围/项目中拥有的任何代码都将可用于您刚刚创建的这个临时的本地(单个 JVM)集群。一切都在本地运行,并会在您的测试结束时退出(如果您正在运行单元测试),或者如果您在 IntelliJ 中将其作为应用程序运行,则在您退出应用程序时退出。

    但是,如果您将 master 设置为指向远程集群(比如 setMaster("spark://localhost:7077") ),您需要确保您的集群可以访问您的新代码(在您的情况下,它需要访问您传递给 filter 的闭包)。

    当我想在正在运行的 Spark 集群上执行一段新代码时,我通常通过将我的应用程序打包在 Uber Jar 中(请参阅 sbt-assembly )然后将其作为参数传递给 spark-submit(点击链接查看更多详细信息) )。

    关于scala - 将 Scala 代码部署到 Spark 时 ClassNotFoundException anonfun,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33222045/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com