gpt4 book ai didi

apache-spark - 排除CDH对spark-core的依赖

转载 作者:行者123 更新时间:2023-12-02 20:25:05 24 4
gpt4 key购买 nike

我正在使用结构化 Spark 流写入来自Kafka的HBase数据。

我的集群分布是:Hadoop 3.0.0-cdh6.2.0,我正在使用Spark 2.4.0

我的代码如下:

val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", bootstrapServers)
.option("subscribe", topic)
.option("failOnDataLoss", false)
.load()
.selectExpr("CAST(key AS STRING)" , "CAST(value AS STRING)")
.as(Encoders.STRING)

df.writeStream
.foreachBatch { (batchDF: Dataset[Row], batchId: Long) =>
batchDF.write
.options(Map(HBaseTableCatalog.tableCatalog->catalog, HBaseTableCatalog.newTable -> "6"))
.format("org.apache.spark.sql.execution.datasources.hbase").save()
}
.option("checkpointLocation", checkpointDirectory)
.start()
.awaitTermination()

HBaseTableCatalog使用json4s-jackson_2.11库。该库包含在Spark Core中,但版本不正确,会导致冲突...

为了解决这个问题,我在spark核心中排除了json4s-jackson_2.11库,并在pom中添加了降级的版本:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.0-cdh6.2.0</version>
<exclusions>
<exclusion>
<groupId>org.json4s</groupId>
<artifactId>json4s-jackson_2.11</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.json4s</groupId>
<artifactId>json4s-jackson_2.11</artifactId>
<version>3.2.11</version>
</dependency>

当我在语言环境机器上执行代码时,它可以正常工作,但是问题是,当我在cloudera集群中提交代码时,我遇到了库冲突的第一个错误:
Caused by: java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue;
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:257)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:80)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:59)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at com.App$$anonfun$main$1.apply(App.scala:129)
at com.App$$anonfun$main$1.apply(App.scala:126)

我知道集群具有自己的hadoop和spark库,并且使用它们,因此,在spark提交中,我将confs spark.driver.userClassPathFirst和spark.executor.userClassPathFirst设置为true,但是我还有另一个错误,并且我不明白:
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.<init>(YarnSparkHadoopUtil.scala:48)
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.<clinit>(YarnSparkHadoopUtil.scala)
at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply$mcJ$sp(Client.scala:83)
at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply(Client.scala:83)
at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply(Client.scala:83)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.yarn.Client.<init>(Client.scala:82)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1603)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassCastException: org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl cannot be cast to org.apache.hadoop.yarn.api.records.Priority
at org.apache.hadoop.yarn.api.records.Priority.newInstance(Priority.java:39)
at org.apache.hadoop.yarn.api.records.Priority.<clinit>(Priority.java:34)
... 15 more

最后,我想要的是使用pom中的json4s-jackson_2.11而不是Spark核心中的一个来制作Spark

最佳答案

要解决此问题,请不要使用spark.driver.userClassPathFirstspark.executor.userClassPathFirst,而应使用spark.driver.extraClassPathspark.executor.extraClassPath

官方documentation的定义:“要在驱动程序的类路径之前附加类路径条目。”

  • 这样的“prepend”放在Spark的核心类路径前面。

  • 范例:

    --conf spark.driver.extraClassPath=C:\Users\Khalid\Documents\Projects\libs\jackson-annotations-2.6.0.jar;C:\Users\Khalid\Documents\Projects\libs\jackson-core-2.6.0.jar;C:\Users\Khalid\Documents\Projects\libs\jackson-databind-2.6.0.jar



    这解决了我的问题(我要使用的Jackson版本与正在使用的 Spark 之间存在冲突)。

    希望能帮助到你。

    关于apache-spark - 排除CDH对spark-core的依赖,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57329060/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com