apache-spark - Databricks notebook 在标准集群模式下分离

databricks 笔记本在使用时反复分离。

我们的数据科学家来自 Jupyter 背景,由于 Koalas 有一些差距,他继续使用 Pandas 并采取一些变通办法,这意味着驱动程序的负担要重得多,但笔记本似乎可以正常工作。


这是我的 Spark 配置:

spark.driver.extraJavaOptions -XX:+UseG1GC
spark.driver.cores 8
spark.driver.memory 16g
spark.executor.extraJavaOptions -XX:+UseG1GC



Notebook detached
Exception when creating execution context:
java.util.concurrent.TimeoutException: Exchange timed out after 15 seconds.

Spark 上下文已停止

The spark context has been stopped or the cluster has been terminated.
Please restart the cluster or attach this notebook to a different cluster.




19/10/08 18:02:59 INFO TaskSchedulerImpl: Killing all running tasks in stage 82: Stage finished
19/10/08 18:02:59 INFO DAGScheduler: Job 57 finished: collectResult at OutputAggregator.scala:149, took 9.157699 s
19/10/08 18:02:59 INFO SQLAppStatusListener: Execution ID: 28 Total Executor Run Time: 21250
19/10/08 18:02:59 INFO CodeGenerator: Code generated in 21.921114 ms
19/10/08 18:03:00 INFO ProgressReporter$: Removed result fetcher for 8919779546758574174_8732072469296650198_763335e3d46b4641ba75b3c6d4b4ffac
19/10/08 18:04:30 INFO DriverCorral$: Cleaning the wrapper ReplId-5231d-7b5c0-a6423-e (currently in status Idle(ReplId-5231d-7b5c0-a6423-e))
19/10/08 18:04:30 INFO DriverCorral$: sending shutdown signal for REPL ReplId-5231d-7b5c0-a6423-e
19/10/08 18:04:31 INFO PythonDriverLocal$Watchdog: Python shell exit code: 143
19/10/08 18:04:31 INFO PythonDriverLocal$RedirectThread: Python RedirectThread exit
19/10/08 18:04:31 INFO PythonDriverLocal$RedirectThread: Python RedirectThread exit
19/10/08 18:04:31 INFO PythonDriverLocal$Watchdog: No strace information recovered: /tmp/637654b25044473abae9a282b9564078.strace is missing
19/10/08 18:04:31 INFO DriverCorral$: sending the interrupt signal for REPL ReplId-5231d-7b5c0-a6423-e
19/10/08 18:04:31 INFO DriverCorral$: waiting for localThread to stop for REPL ReplId-5231d-7b5c0-a6423-e
19/10/08 18:04:31 INFO DriverCorral$: ReplId-5231d-7b5c0-a6423-e successfully discarded


我遇到了类似的问题。驱动程序崩溃并停止并显示此消息。在我的例子中,工作只是在驱动程序上执行,它没有足够的能力来执行作业(查询 + s3 文件保存)。我减少了 df maxRecordsPerFile,所以现在有更多文件(在我只有一个之前)并且可以在节点之间并行化它。在此之后,工作执行得很好。并且驱动程序不再崩溃并与笔记本电脑分离。希望对您有所帮助。

