gpt4 book ai didi

hadoop - Spark Streaming + Hbase:NoClassDefFoundError:org/apache/hadoop/hbase/spark/HBaseContext

转载 作者:行者123 更新时间:2023-12-02 20:54:05 24 4
gpt4 key购买 nike

我正在尝试将Spark Streaming与Hbase连接起来。我对代码所做的全部工作都是使用example code,但出现了一个奇怪的运行时错误:

Exception in thread "streaming-job-executor-8" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at buri.sparkour.HBaseInteractor.<init>(HBaseInteractor.java:26)
at buri.sparkour.JavaCustomReceiver.lambda$main$94c29978$1(JavaCustomReceiver.java:104)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:256)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:256)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:256)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:255)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

关于堆栈溢出的问题很少,所有这些问题都涉及向正确的jar文件添加路径。我尝试使用SBT构建“ super ” jar并将其传递给 spark-submit,但仍然出现此错误。

这是我的build.sbt文件:
 



val sparkVersion = "2.1.0"

val hadoopVersion = "2.7.3"
val hbaseVersion = "1.3.1"

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-streaming" % sparkVersion ,
"org.apache.commons" % "commons-csv" % "1.2" % "provided" ,
"org.apache.hadoop" % "hadoop-hdfs" % "2.5.2" % "provided" ,
"org.apache.hbase" % "hbase-spark" % "2.0.0-alpha-1" % "provided",
"org.apache.hbase" % "hbase-client" % hbaseVersion ,
"org.apache.hadoop" % "hadoop-common" % hadoopVersion % "provided" ,
"org.apache.hbase" % "hbase-common" % hbaseVersion ,
"org.apache.hbase" % "hbase-server" % hbaseVersion % "provided",
"org.apache.hbase" % "hbase" % hbaseVersion
)

assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}

uber jar编译后,我可以看到HBaseContext.class确实存在,因此我不确定为什么在运行时找不到该类。

有什么想法/指针吗?

(我也尝试过在spark.driver.extraClassPath等中定义类路径,但这也不起作用)

最佳答案

看一看this更新NoClassDefFoundError的帖子。由于我使用Maven,因此不确定build.sbt,但是依赖项看起来不错。

关于hadoop - Spark Streaming + Hbase:NoClassDefFoundError:org/apache/hadoop/hbase/spark/HBaseContext,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45121246/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com