gpt4 book ai didi

python - 设置 PYSPARK_SUBMIT_ARGS 后 PySpark 在 Jupyter 中失败

转载 作者:太空宇宙 更新时间:2023-11-03 14:04:17 25 4
gpt4 key购买 nike

我正在尝试在 Jupyter 笔记本中加载 Spark (2.2.1) 包,否则 Spark 可以正常运行。一旦我添加

%env PYSPARK_SUBMIT_ARGS='--packages com.databricks:spark-redshift_2.10:2.0.1 pyspark-shell'

我在尝试创建上下文时收到此错误:

---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-5-b25d0ed9494e> in <module>()
----> 1 sc = SparkContext.getOrCreate()
2 sql_context = SQLContext(sc)

/usr/local/spark/spark-2.2.1-bin-without-hadoop/python/pyspark/context.py in getOrCreate(cls, conf)
332 with SparkContext._lock:
333 if SparkContext._active_spark_context is None:
--> 334 SparkContext(conf=conf or SparkConf())
335 return SparkContext._active_spark_context
336

/usr/local/spark/spark-2.2.1-bin-without-hadoop/python/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
113 """
114 self._callsite = first_spark_call() or CallSite(None, None, None)
--> 115 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
116 try:
117 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

/usr/local/spark/spark-2.2.1-bin-without-hadoop/python/pyspark/context.py in _ensure_initialized(cls, instance, gateway, conf)
281 with SparkContext._lock:
282 if not SparkContext._gateway:
--> 283 SparkContext._gateway = gateway or launch_gateway(conf)
284 SparkContext._jvm = SparkContext._gateway.jvm
285

/usr/local/spark/spark-2.2.1-bin-without-hadoop/python/pyspark/java_gateway.py in launch_gateway(conf)
93 callback_socket.close()
94 if gateway_port is None:
---> 95 raise Exception("Java gateway process exited before sending the driver its port number")
96
97 # In Windows, ensure the Java child processes do not linger after Python has exited.

Exception: Java gateway process exited before sending the driver its port number

同样,只要未设置PYSPARK_SUBMIT_ARGS(或仅设置为pyspark-shell),一切都会正常工作。一旦我添加其他任何内容(例如,如果我将其设置为 --master local pyspark-shell),我就会收到此错误。在谷歌上搜索后,大多数人建议简单地删除 PYSPARK_SUBMIT_ARGS ,但出于明显的原因我不能这样做。

我也尝试过设置我的 JAVA_HOME ,尽管我不明白为什么这会产生影响,因为 Spark 可以在没有该环境变量的情况下工作。我使用 spark-submitpyspark 在 Jupyter 外部传递的参数。

我想我的第一个问题是,有没有办法获得更详细的错误消息?某个地方有日志文件吗?当前的消息实际上没有告诉我任何信息。

最佳答案

在初始化 SparkContext 之前按如下方式设置 PYSPARK_SUBMIT_ARGS:

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-redshift_2.10:2.0.1 pyspark-shell'

关于python - 设置 PYSPARK_SUBMIT_ARGS 后 PySpark 在 Jupyter 中失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49020056/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com