gpt4 book ai didi

python - Spark 异常 : Python in worker has different version 3. 4 比驱动程序 3.5

转载 作者:太空宇宙 更新时间:2023-11-04 08:47:28 25 4
gpt4 key购买 nike

我使用的是 Amazon EC2,我的主服务器和开发服务器合二为一。我还有另一个针对单个 worker 的实例。

我是新手,但我已经设法让 spark 在独立模式下工作。现在我正在尝试集群。 master 和 worker 处于事件状态(我可以看到它们的 webUI,并且它们正在运行)。

我有 Spark 2.0,我已经安装了最新的 Anaconda 4.1.1,它随 Python 3.5.2 一起提供。在 worker 和 master 中,如果我转到 pyspark 并执行 os.version_info,我将获得 3.5.2,我还正确设置了所有环境变量(如 stackoverflow 和 google 上的其他帖子所示)(例如,PYSPARK_PYTHON) .

无论如何都没有 3.4 版本的 python。所以我想知道如何解决这个问题。

我通过运行此命令得到错误:

rdd = sc.parallelize([1,2,3])
rdd.count()

count() 方法发生错误:

16/08/13 18:44:31 ERROR Executor: Exception in task 1.0 in stage 2.0 (TID 17)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 123, in main
("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 3.4 than that in driver 3.5, PySpark cannot run with different minor versions

at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/08/13 18:44:31 ERROR Executor: Exception in task 1.1 in stage 2.0 (TID 18)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 123, in main
("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 3.4 than that in driver 3.5, PySpark cannot run with different minor versions
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

最佳答案

因为您已经在使用 Anaconda,您可以简单地创建一个具有所需 Python 版本的环境:

conda create --name foo python=3.4
source activate foo

python --version
## Python 3.4.5 :: Continuum Analytics, Inc

并将其用作 PYSPARK_DRIVER_PYTHON:

export PYSPARK_DRIVER_PYTHON=/path/to/anaconda/envs/foo/bin/python

关于python - Spark 异常 : Python in worker has different version 3. 4 比驱动程序 3.5,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38936150/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com