gpt4 book ai didi

apache-spark - 将 PySpark 与 Jupyter Notebook 集成

转载 作者:行者123 更新时间:2023-12-02 08:19:30 26 4
gpt4 key购买 nike

我正在关注这个site安装 Jupyter Notebook、PySpark 并将两者集成。

当我需要创建“Jupyter 配置文件”时,我读到“Jupyter 配置文件”不再存在。所以我继续执行以下几行。

$ mkdir -p ~/.ipython/kernels/pyspark

$ touch ~/.ipython/kernels/pyspark/kernel.json

我打开kernel.json并编写以下内容:

{
"display_name": "pySpark",
"language": "python",
"argv": [
"/usr/bin/python",
"-m",
"IPython.kernel",
"-f",
"{connection_file}"
],
"env": {
"SPARK_HOME": "/usr/local/Cellar/spark-2.0.0-bin-hadoop2.7",
"PYTHONPATH": "/usr/local/Cellar/spark-2.0.0-bin-hadoop2.7/python:/usr/local/Cellar/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip",
"PYTHONSTARTUP": "/usr/local/Cellar/spark-2.0.0-bin-hadoop2.7/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": "pyspark-shell"
}
}

Spark的路径是正确的。

但是,当我运行 jupyter console --kernel pyspark 时,我得到以下输出:

MacBook:~ Agus$ jupyter console --kernel pyspark
/usr/bin/python: No module named IPython
Traceback (most recent call last):
File "/usr/local/bin/jupyter-console", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/jupyter_core/application.py", line 267, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/usr/local/lib/python2.7/site-packages/traitlets/config/application.py", line 595, in launch_instance
app.initialize(argv)
File "<decorator-gen-113>", line 2, in initialize
File "/usr/local/lib/python2.7/site-packages/traitlets/config/application.py", line 74, in catch_config_error
return method(app, *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/jupyter_console/app.py", line 137, in initialize
self.init_shell()
File "/usr/local/lib/python2.7/site-packages/jupyter_console/app.py", line 110, in init_shell
client=self.kernel_client,
File "/usr/local/lib/python2.7/site-packages/traitlets/config/configurable.py", line 412, in instance
inst = cls(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/jupyter_console/ptshell.py", line 251, in __init__
self.init_kernel_info()
File "/usr/local/lib/python2.7/site-packages/jupyter_console/ptshell.py", line 305, in init_kernel_info
raise RuntimeError("Kernel didn't respond to kernel_info_request")
RuntimeError: Kernel didn't respond to kernel_info_request

最佳答案

将 pyspark 与 Jupyter Notebook 集成的多种方法。
1.安装Apache Toree

  pip install jupyter
pip install toree
jupyter toree install --spark_home=path/to/your/spark_directory --interpreters=PySpark

您可以通过

检查安装情况
 jupyter kernelspec list

您将获得 toree pyspark 内核的条目

  apache_toree_pyspark    /home/pauli/.local/share/jupyter/kernels/apache_toree_pyspark

之后,如果需要,您可以安装其他解释器,例如 SparkR、Scala、SQL

 jupyter toree install --interpreters=Scala,SparkR,SQL

2.将这些行添加到 bashrc

  export SPARK_HOME=/path to /spark-2.2.0
export PATH="$PATH:$SPARK_HOME/bin"
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

在终端中输入pyspark,它将打开一个初始化了sparkcontext的jupyter笔记本。

  • 仅将 pyspark 安装为 python package
    pip 安装 pyspark

    现在您可以像导入另一个 python 包一样导入 pyspark。

  • 关于apache-spark - 将 PySpark 与 Jupyter Notebook 集成,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39149541/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com