gpt4 book ai didi

scala - 激发多个上下文

转载 作者:行者123 更新时间:2023-12-04 03:26:34 26 4
gpt4 key购买 nike

总之 :

EC2集群:1主3从

Spark 版本:1.3.1

我希望使用选项 spark.driver.allowMultipleContexts ,一个本地上下文(仅主)和一个集群(主和从)。

我收到此堆栈跟踪错误(第 29 行是我调用初始化第二个 sparkcontext 的对象的地方):

fr.entry.Main.main(Main.scala)
at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1$$anonfun$apply$10.apply(SparkContext.scala:1812)
at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1$$anonfun$apply$10.apply(SparkContext.scala:1808)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:1808)
at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:1795)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:1795)
at org.apache.spark.SparkContext$.setActiveContext(SparkContext.scala:1847)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:1754)
at fr.entry.cluster$.<init>(Main.scala:79)
at fr.entry.cluster$.<clinit>(Main.scala)
at fr.entry.Main$delayedInit$body.apply(Main.scala:29)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
at scala.App$class.main(App.scala:71)
at fr.entry.Main$.main(Main.scala:14)
at fr.entry.Main.main(Main.scala)
15/09/28 15:33:30 INFO AppClient$ClientActor: Executor updated: app- 20150928153330-0036/2 is now LOADING
15/09/28 15:33:30 INFO AppClient$ClientActor: Executor updated: app- 20150928153330-0036/0 is now RUNNING
15/09/28 15:33:30 INFO AppClient$ClientActor: Executor updated: app-20150928153330-0036/1 is now RUNNING
15/09/28 15:33:30 INFO SparkContext: Starting job: sum at Main.scala:29
15/09/28 15:33:30 INFO DAGScheduler: Got job 0 (sum at Main.scala:29) with 2 output partitions (allowLocal=false)
15/09/28 15:33:30 INFO DAGScheduler: Final stage: Stage 0(sum at Main.scala:29)
15/09/28 15:33:30 INFO DAGScheduler: Parents of final stage: List()
15/09/28 15:33:30 INFO DAGScheduler: Missing parents: List()
15/09/28 15:33:30 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[2] at numericRDDToDoubleRDDFunctions at Main.scala:29), which has no missing parents
15/09/28 15:33:30 INFO MemoryStore: ensureFreeSpace(2264) called with curMem=0, maxMem=55566516879
15/09/28 15:33:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.2 KB, free 51.8 GB)
15/09/28 15:33:30 INFO MemoryStore: ensureFreeSpace(1656) called with curMem=2264, maxMem=55566516879
15/09/28 15:33:30 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1656.0 B, free 51.8 GB)
15/09/28 15:33:30 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40476 (size: 1656.0 B, free: 51.8 GB)
15/09/28 15:33:30 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/09/28 15:33:30 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:839
15/09/28 15:33:30 INFO AppClient$ClientActor: Executor updated: app-20150928153330-0036/2 is now RUNNING
15/09/28 15:33:30 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MapPartitionsRDD[2] at numericRDDToDoubleRDDFunctions at Main.scala:29)
15/09/28 15:33:30 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/09/28 15:33:45 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
15/09/28 15:34:00 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

更多详情:

我想运行一个程序来做两件事。首先,我有一个本地 sparkContext(仅在 master 上),我创建了一个 RDD 并执行了一些操作。其次,我有第二个 sparkContext 初始化,其中包含一个主节点和 3 个从节点,它们也创建一个 RDD 并执行一些操作。
因此,在第一种情况下,我想使用主设备的 16 个内核,而第二种情况下,我想使用从设备的 8 核 x 3。

简单示例:
val arr = Array(Array(1, 2, 3, 4, 5, 6, 7, 8), Array(1, 2, 3, 4, 5, 6, 7, 8))
println(local.sparkContext.makeRDD(arr).count())
println(cluster.sparkContext.makeRDD(arr).map(l => l.sum).sum)

我的两个 SparkContexts :
object local {

val project = "test"
val version = "1.0"

val sc = new SparkConf()
.setMaster("local[16]")
.setAppName("Local")
.set("spark.local.dir", "/mnt")
.setJars(Seq("target/scala-2.10/" + project + "-assembly-" + version + ".jar", "target/scala-2.10/" + project + "_2.10-" + version + "-tests.jar"))
.setSparkHome("/root/spark")
.set("spark.driver.allowMultipleContexts", "true")
.set("spark.executor.memory", "45g")

val sparkContext = new SparkContext(sc)
}

object cluster {

val project = "test"
val version = "1.0"

val sc = new SparkConf()
.setMaster(masterURL) // ec2-XX-XXX-XXX-XX.compute-1.amazonaws.com
.setAppName("Cluster")
.set("spark.local.dir", "/mnt")
.setJars(Seq("target/scala-2.10/" + project + "-assembly-" + version + ".jar", "target/scala-2.10/" + project + "_2.10-" + version + "-tests.jar") ++ otherJars)
.setSparkHome("/root/spark")
.set("spark.driver.allowMultipleContexts", "true")
.set("spark.executor.memory", "35g")

val sparkContext = new SparkContext(sc)
}

我怎样才能解决这个问题?

最佳答案

尽管配置选项 spark.driver.allowMultipleContexts 存在,但它具有误导性,因为不鼓励使用多个 Spark 上下文。此选项仅用于 Spark 内部测试,不应在用户程序中使用。在单个 JVM 中运行多个 Spark 上下文时,您可能会得到意想不到的结果。

关于scala - 激发多个上下文,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32827333/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com