gpt4 book ai didi

apache-spark - 我可以在集群部署模式下运行 pyspark jupyter notebook 吗?

转载 作者:行者123 更新时间:2023-12-04 15:30:38 24 4
gpt4 key购买 nike

上下文:
集群配置如下:

  • 一切都在使用 docker 文件运行。
  • node1: Spark 大师
  • node2:jupyter 集线器(我也在那里运行我的笔记本)
  • node3-7: Spark worker 节点
  • 我可以使用 spark
  • 的默认端口从我的工作节点 telnet 和 ping 到 node2,反之亦然。

    问题:
    我正在尝试在以集群部署模式运行的 pyspark jupyter notebook 中创建一个 spark session 。我试图让驱动程序在不是运行 jupyter notebook 的节点的节点上运行。现在我可以在集群上运行作业,但只能在 node2 上运行驱动程序。

    经过大量挖掘,我发现了这个 stackoverflow post它声称,如果您使用 spark 运行交互式 shell,则只能在本地部署模式下执行此操作(驱动程序位于您正在使用的机器上)。那个帖子接着说,像 jupyter hub 这样的东西也不能在集群部署模式下工作,但我找不到任何可以确认这一点的文档。有人可以确认 jupyter hub 是否可以在集群模式下运行吗?

    我尝试在集群部署模式下运行 spark session :
    from pyspark.sql import SparkSession

    spark = SparkSession.builder\
    .enableHiveSupport()\
    .config("spark.local.ip",<node 3 ip>)\
    .config("spark.driver.host",<node 3 ip>)\
    .config('spark.submit.deployMode','cluster')\
    .getOrCreate()

    错误:
    /usr/spark/python/pyspark/sql/session.py in getOrCreate(self)
    167 for key, value in self._options.items():
    168 sparkConf.set(key, value)
    --> 169 sc = SparkContext.getOrCreate(sparkConf)
    170 # This SparkContext may be an existing one.
    171 for key, value in self._options.items():

    /usr/spark/python/pyspark/context.py in getOrCreate(cls, conf)
    308 with SparkContext._lock:
    309 if SparkContext._active_spark_context is None:
    --> 310 SparkContext(conf=conf or SparkConf())
    311 return SparkContext._active_spark_context
    312

    /usr/spark/python/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    113 """
    114 self._callsite = first_spark_call() or CallSite(None, None, None)
    --> 115 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    116 try:
    117 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

    /usr/spark/python/pyspark/context.py in _ensure_initialized(cls, instance, gateway, conf)
    257 with SparkContext._lock:
    258 if not SparkContext._gateway:
    --> 259 SparkContext._gateway = gateway or launch_gateway(conf)
    260 SparkContext._jvm = SparkContext._gateway.jvm
    261

    /usr/spark/python/pyspark/java_gateway.py in launch_gateway(conf)
    93 callback_socket.close()
    94 if gateway_port is None:
    ---> 95 raise Exception("Java gateway process exited before sending the driver its port number")
    96
    97 # In Windows, ensure the Java child processes do not linger after Python has exited.

    Exception: Java gateway process exited before sending the driver its port number

    最佳答案

    You cannot use cluster mode with PySpark at all :

    Currently, standalone mode does not support cluster mode for Python applications.



    即使你可以 cluster mode is not applicable in interactive environment :

    case (_, CLUSTER) if isShell(args.primaryResource) =>
    error("Cluster deploy mode is not applicable to Spark shells.")
    case (_, CLUSTER) if isSqlShell(args.mainClass) =>
    error("Cluster deploy mode is not applicable to Spark SQL shell.")

    关于apache-spark - 我可以在集群部署模式下运行 pyspark jupyter notebook 吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45997150/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com