gpt4 book ai didi

apache-spark - Spark 2.0 : Redefining SparkSession params through GetOrCreate and NOT seeing changes in WebUI

转载 作者:行者123 更新时间:2023-12-04 03:17:23 25 4
gpt4 key购买 nike

我将 Spark 2.0 与 PySpark 一起使用。
我正在重新定义 SparkSession参数通过 GetOrCreate 2.0中引入的方法:

This method first checks whether there is a valid global default SparkSession, and if yes, return that one. If no valid global default SparkSession exists, the method creates a new SparkSession and assigns the newly created SparkSession as the global default.

In case an existing SparkSession is returned, the config options specified in this builder will be applied to the existing SparkSession.


https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.getOrCreate
到现在为止还挺好:
from pyspark import SparkConf

SparkConf().toDebugString()
'spark.app.name=pyspark-shell\nspark.master=local[2]\nspark.submit.deployMode=client'

spark.conf.get("spark.app.name")
'pyspark-shell'
然后我重新定义 SparkSession配置 promise 以查看 WebUI 中的更改

appName(name)
Sets a name for the application, which will be shown in the Spark web UI.


https://spark.apache.org/docs/2.0.1/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder.appName
c = SparkConf()
(c
.setAppName("MyApp")
.setMaster("local")
.set("spark.driver.memory","1g")
)

from pyspark.sql import SparkSession
(SparkSession
.builder
.enableHiveSupport() # metastore, serdes, Hive udf
.config(conf=c)
.getOrCreate())

spark.conf.get("spark.app.name")
'MyApp'
现在,当我转到 localhost:4040 , 我希望看到 MyApp作为应用名称。
但是,我仍然看到 pyspark-shell application UI我哪里错了?
提前致谢!

最佳答案

我相信这里的文档有点误导,当您使用 Scala 时,您实际上会看到如下警告:

... WARN SparkSession$Builder: Use an existing SparkSession, some configuration may not take effect.

在 Spark 2.0 之前,上下文之间的分离更加明显:
  • SparkContext无法在运行时修改配置。您必须先停止现有的上下文。
  • SQLContext可以在运行时修改配置。
  • spark.app.name与许多其他选项一样,绑定(bind)到 SparkContext , 并且不能在不停止上下文的情况下进行修改。

    重用现有的 SparkContext/SparkSession

    import org.apache.spark.SparkConf
    import org.apache.spark.sql.SparkSession

    spark.conf.get("spark.sql.shuffle.partitions")

    String = 200

    val conf = new SparkConf()
    .setAppName("foo")
    .set("spark.sql.shuffle.partitions", "2001")

    val spark = SparkSession.builder.config(conf).getOrCreate()

    ... WARN SparkSession$Builder: Use an existing SparkSession ...
    spark: org.apache.spark.sql.SparkSession = ...

    spark.conf.get("spark.sql.shuffle.partitions")

    String = 2001

    spark.app.name配置更新:

    spark.conf.get("spark.app.name")

    String = foo

    不影响 SparkContext :

    spark.sparkContext.appName

    String = Spark shell

    停止现有 SparkContext/SparkSession

    现在让我们停止 session 并重复该过程:

    spark.stop
    val spark = SparkSession.builder.config(conf).getOrCreate()

    ...  WARN SparkContext: Use an existing SparkContext ...
    spark: org.apache.spark.sql.SparkSession = ...

    spark.sparkContext.appName

    String = foo

    有趣的是,当我们停止 session 时,我们仍然会收到关于使用现有 SparkContext 的警告。 ,但您可以检查它实际上已停止。

    关于apache-spark - Spark 2.0 : Redefining SparkSession params through GetOrCreate and NOT seeing changes in WebUI,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40701518/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com