gpt4 book ai didi

apache-spark - 独立 Spark 集群。无法以编程方式提交作业 -> java.io.InvalidClassException

转载 作者:行者123 更新时间:2023-12-04 01:04:03 24 4
gpt4 key购买 nike

Spark 小伙伴们,我对 Spark 很陌生,所以我确实希望得到您的帮助。

我正在尝试通过笔记本电脑在 Spark 集群上安排非常简单的工作。尽管它有效,但当我使用 ./spark-submit 提交它时,当我尝试以编程方式执行时,它会引发异常。

环境:
- Spark - 1 个主节点和 2 个工作节点(独立模式)。 Spark 未编译,但二进制文件已下载。 Spark 版本 - 1.0.2
- Java 版本“1.7.0_45”
- 应用程序 jar 无处不在(在客户端和同一个工作节点上);
- README.md 文件也被复制到每个节点;

我正在尝试运行的应用程序:

val logFile = "/user/vagrant/README.md"

val conf = new SparkConf()
conf.setMaster("spark://192.168.33.50:7077")
conf.setAppName("Simple App")
conf.setJars(List("file:///user/vagrant/spark-1.0.2-bin-hadoop1/bin/hello-apache-spark_2.10-1.0.0-SNAPSHOT.jar"))
conf.setSparkHome("/user/vagrant/spark-1.0.2-bin-hadoop1")

val sc = new SparkContext(conf)

val logData = sc.textFile(logFile, 2).cache()

...

所以问题是,当我这样做时,这个应用程序成功地在集群上运行:
./spark-submit --class com.paycasso.SimpleApp --master spark://192.168.33.50:7077 --deploy-mode client file:///home/vagrant/spark-1.0.2-bin-hadoop1/bin/hello-apache-spark_2.10-1.0.0-SNAPSHOT.jar

但它不起作用,当我尝试通过调用 sbt run 以编程方式执行相同操作时

这是我在主节点上获得的堆栈跟踪:
14/09/04 15:09:44 ERROR Remoting: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = -6451051318873184044, local class serialVersionUID = 583745679236071411
java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = -6451051318873184044, local class serialVersionUID = 583745679236071411
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
at scala.util.Try$.apply(Try.scala:161)
at akka.serialization.Serialization.deserialize(Serialization.scala:98)
at akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:58)
at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
at scala.util.Try$.apply(Try.scala:161)
at akka.serialization.Serialization.deserialize(Serialization.scala:98)
at akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
at akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:55)
at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:55)
at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:73)
at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:764)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

解决这个问题的方法是什么?
提前谢谢你。

最佳答案

在浪费了很多时间之后,我找到了问题所在。
尽管我没有在我的应用程序中使用过 hadoop/hdfs,但 hadoop 客户端很重要。问题出在 hadoop-client 版本中,它与 hadoop 版本不同,spark 是为它构建的。
Spark 的 hadoop 版本为 1.2.1,但在我的应用程序中为 2.4。

当我在我的应用程序中将 hadoop 客户端的版本更改为 1.2.1 时,我能够在集群上执行 spark 代码。

关于apache-spark - 独立 Spark 集群。无法以编程方式提交作业 -> java.io.InvalidClassException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25682836/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com