gpt4 book ai didi

amazon-web-services - Sagemaker Studio Pyspark 示例失败

转载 作者:行者123 更新时间:2023-12-05 03:45:31 35 4
gpt4 key购买 nike

当我尝试在 Sagemaker Studio 中使用 PySpark 运行 Sagemaker 提供的示例时

import os

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession

import sagemaker
from sagemaker import get_execution_role
import sagemaker_pyspark

role = get_execution_role()

# Configure Spark to use the SageMaker Spark dependency jars
jars = sagemaker_pyspark.classpath_jars()

classpath = ":".join(sagemaker_pyspark.classpath_jars())

# See the SageMaker Spark Github repo under sagemaker-pyspark-sdk
# to learn how to connect to a remote EMR cluster running Spark from a Notebook Instance.
spark = SparkSession.builder.config("spark.driver.extraClassPath", classpath)\
.master("local[*]").getOrCreate()

我得到以下异常:

    ---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-6-c8f6fff0daaf> in <module>
19 # to learn how to connect to a remote EMR cluster running Spark from a Notebook Instance.
20 spark = SparkSession.builder.config("spark.driver.extraClassPath", classpath)\
---> 21 .master("local[*]").getOrCreate()

/opt/conda/lib/python3.6/site-packages/pyspark/sql/session.py in getOrCreate(self)
171 for key, value in self._options.items():
172 sparkConf.set(key, value)
--> 173 sc = SparkContext.getOrCreate(sparkConf)
174 # This SparkContext may be an existing one.
175 for key, value in self._options.items():

/opt/conda/lib/python3.6/site-packages/pyspark/context.py in getOrCreate(cls, conf)
361 with SparkContext._lock:
362 if SparkContext._active_spark_context is None:
--> 363 SparkContext(conf=conf or SparkConf())
364 return SparkContext._active_spark_context
365

/opt/conda/lib/python3.6/site-packages/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
127 " note this option will be removed in Spark 3.0")
128
--> 129 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
130 try:
131 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

/opt/conda/lib/python3.6/site-packages/pyspark/context.py in _ensure_initialized(cls, instance, gateway, conf)
310 with SparkContext._lock:
311 if not SparkContext._gateway:
--> 312 SparkContext._gateway = gateway or launch_gateway(conf)
313 SparkContext._jvm = SparkContext._gateway.jvm
314

/opt/conda/lib/python3.6/site-packages/pyspark/java_gateway.py in launch_gateway(conf)
44 :return: a JVM gateway
45 """
---> 46 return _launch_gateway(conf)
47
48

/opt/conda/lib/python3.6/site-packages/pyspark/java_gateway.py in _launch_gateway(conf, insecure)
106
107 if not os.path.isfile(conn_info_file):
--> 108 raise Exception("Java gateway process exited before sending its port number")
109
110 with open(conn_info_file, "rb") as info:

Exception: Java gateway process exited before sending its port number

在运行示例之前,我使用笔记本中的 pip 安装了 pyspark 和 sagemaker_pyspark。我还使用 SageMaker 内核库中的 SparkMagic 内核。

最佳答案

您遇到此问题的原因可能是此笔记本设计为在您拥有 EMR 集群时运行。我建议您在 Sagemaker 上启动一个带有 conda_python3 内核的笔记本,而不是 SparkMagic 内核。您将需要使用 pip 安装 pysparksagemaker_pyspark,但它应该适用于您发布的代码。

关于amazon-web-services - Sagemaker Studio Pyspark 示例失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65770913/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com