gpt4 book ai didi

scala - SparkAppHandle 监听器没有被调用

转载 作者:行者123 更新时间:2023-12-04 12:51:09 24 4
gpt4 key购买 nike

我正在尝试使用 play 框架在 Scala 中的 kubernetes 集群上提交 spark 2.3 作业。

我也尝试过不使用 play 框架的简单 Scala 程序。

作业正在提交到 k8 集群,但 stateChanged 和 infoChanged 没有被调用。我也希望能够获得 handle.getAppId。

我正在使用 spark submit 提交作业,如 here 所述

$ bin/spark-submit \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=<spark-image> \
local:///path/to/examples.jar

这是作业的代码:
def index = Action {
try {
val spark = new SparkLauncher()
.setMaster("my k8 apiserver host")
.setVerbose(true)
.addSparkArg("--verbose")
.setMainClass("myClass")
.setAppResource("hdfs://server/inputs/my.jar")
.setConf("spark.app.name","myapp")
.setConf("spark.executor.instances","5")
.setConf("spark.kubernetes.container.image","mydockerimage")
.setDeployMode("cluster")
.startApplication(new SparkAppHandle.Listener(){

def infoChanged(handle: SparkAppHandle): Unit = {
System.out.println("Spark App Id ["
+ handle.getAppId
+ "] Info Changed. State ["
+ handle.getState + "]")
}

def stateChanged(handle: SparkAppHandle): Unit = {
System.out.println("Spark App Id ["
+ handle.getAppId
+ "] State Changed. State ["
+ handle.getState + "]")
if (handle.getState.toString == "FINISHED") System.exit(0)
}
} )

Ok(spark.getState().toString())

} catch {
case NonFatal(e)=>{
println("failed with exception: " + e)
}
}
Ok
}

最佳答案

Spark Launcher 架构概述
SparkLauncher允许以编程方式运行 spark-submit命令。它在 JVM 中作为单独的子线程运行。您需要在客户端主函数中等待,直到驱动程序在 K8s 中启动并获得监听器回调。否则,JVM 主线程会杀死客户端并且不报告任何内容。

-----------------------                       -----------------------
| User App | spark-submit | Spark App |
| | -------------------> | |
| ------------| |------------- |
| | | hello | | |
| | L. Server |<----------------------| L. Backend | |
| | | | | |
| ------------- -----------------------
| | | ^
| v | |
| -------------| |
| | | <per-app channel> |
| | App Handle |<------------------------------
| | |
-----------------------

解决方案

我添加了一个 j.u.c.CountDownLatch防止主线程退出直到 appState.isFinal 的实现到达了。
object SparkLauncher {
def main(args: Array[String]) {

import java.util.concurrent.CountDownLatch
val countDownLatch = new CountDownLatch(1)

val launcher = new SparkLauncher()
.setMaster("k8s://http://127.0.0.1:8001")
.setAppResource("local:/{PATH}/spark-examples_2.11-2.3.0.jar")
.setConf("spark.app.name","spark-pi")
.setMainClass("org.apache.spark.examples.SparkPi")
.setConf("spark.executor.instances","5")
.setConf("spark.kubernetes.container.image","spark:spark-docker")
.setConf("spark.kubernetes.driver.pod.name","spark-pi-driver")
.setDeployMode("cluster")
.startApplication(new SparkAppHandle.Listener() {
def infoChanged(handle: SparkAppHandle): Unit = {
}

def stateChanged(handle: SparkAppHandle): Unit = {
val appState = handle.getState()
println(s"Spark App Id [${handle.getAppId}] State Changed. State [${handle.getState}]")

if (appState != null && appState.isFinal) {
countDownLatch.countDown //waiting until spark driver exits
}
}
})

countDownLatch.await()
}
}

关于scala - SparkAppHandle 监听器没有被调用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49713004/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com