- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我在我的 Ubuntu 机器上运行带有 java openjdk 11.0.11 的 pyspark 3.1.1。我在数据框中创建了一些随机数据
import numpy as np
n, p = (10, 4)
data = np.random.rand(n, p)
# create the dataframe
schem = StructType([StructField('col_%d'%i, FloatType(), False) for i in range(p)])
randData = spark.createDataFrame(data.tolist(), schema=schem)
randData.show()
这在我的带有 pyspark 3.0.1 的 Windows 10 机器上运行良好,但在上述系统上,我得到了这个堆栈跟踪:
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-21-6f093bb5efc1> in <module>
7 # create the dataframe
8 randData = spark.createDataFrame(data.tolist(), schema=schem)
----> 9 randData.show()
~/spark-3.1.1-bin-hadoop2.7/python/pyspark/sql/dataframe.py in show(self, n, truncate, vertical)
482 """
483 if isinstance(truncate, bool) and truncate:
--> 484 print(self._jdf.showString(n, 20, vertical))
485 else:
486 print(self._jdf.showString(n, int(truncate), vertical))
~/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306
~/spark-3.1.1-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
109 def deco(*a, **kw):
110 try:
--> 111 return f(*a, **kw)
112 except py4j.protocol.Py4JJavaError as e:
113 converted = convert_exception(e.java_exception)
~/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o221.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 34.0 failed 1 times, most recent failure: Lost task 0.0 in stage 34.0 (TID 619) (192.168.150.128 executor driver): org.apache.spark.SparkException:
Bad data in pyspark.daemon's standard output. Invalid port number:
458961458 (0x1b5b3232)
Python command to execute the daemon was:
ipython3 -m pyspark.daemon
Check that you don't have any unexpected modules or libraries in
your PYTHONPATH:
/home/ahowe42/spark-3.1.1-bin-hadoop2.7/python/lib/pyspark.zip:/home/ahowe42/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip:/home/ahowe42/spark-3.1.1-bin-hadoop2.7/jars/spark-core_2.12-3.1.1.jar:/home/ahowe42/spark-3.1.1-bin-hadoop2.7/python:
Also, check if you have a sitecustomize.py module in your python path,
or in your python installation, that is printing to standard output
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:238)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2253)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2202)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2201)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2201)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1078)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1078)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1078)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2440)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2382)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2371)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2202)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2223)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2242)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:472)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:425)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:47)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3696)
at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2722)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2722)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2929)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:301)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:338)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException:
Bad data in pyspark.daemon's standard output. Invalid port number:
458961458 (0x1b5b3232)
Python command to execute the daemon was:
ipython3 -m pyspark.daemon
Check that you don't have any unexpected modules or libraries in
your PYTHONPATH:
/home/ahowe42/spark-3.1.1-bin-hadoop2.7/python/lib/pyspark.zip:/home/ahowe42/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip:/home/ahowe42/spark-3.1.1-bin-hadoop2.7/jars/spark-core_2.12-3.1.1.jar:/home/ahowe42/spark-3.1.1-bin-hadoop2.7/python:
Also, check if you have a sitecustomize.py module in your python path,
or in your python installation, that is printing to standard output
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:238)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
我用这个简单的命令得到了同样的堆栈跟踪:
spark.CreateDataFrame([[1, 2, 3], [4, 5, 6]], ['a', 'b', 'c']).show()
.
JAVA_HOME
设置为
/usr/lib/jvm/java-11-openjdk-amd64
.我的 bashrc 中有以下设置:
export PYTHONPATH=/home/ahowe42/anaconda3/bin
export SPARK_HOME=/home/ahowe42/spark-3.1.1-bin-hadoop2.7
export PYSPARK_PYTHON=ipython3
export PYTHONPATH=$PYTHONPATH:$SPARK_HOME/python
export PATH=$PATH:$SPARK_HOME/bin:$PYTHONPATH:$JAVA_HOME/jre/bin
我使用加载和初始化 pyspark
import findspark
findspark.init()
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import *#avg, count, expr
from pyspark.sql.types import *
sc = pyspark.SparkContext()
spark = SparkSession(sc)
spark.sparkContext.appName = 'exploreReadWrite'
spark
最佳答案
这确实是一条评论,但我还没有代表发表评论。
你试过which python
? spark 端口号的类似错误是指库之间的 python 版本不匹配。关于您的$PYTHONPATH
,这将在多个位置设置,它可以由您的 shell 调用的任何进程更新 - 所以您知道它已从您的 .bashrc
中的字符串更改由您的服务器。但是,$PYTHONPATH
我想会从左到右搜索,所以如果你的anaconda目录中有python运行时,我认为它会在spark版本之前被调用,如果你只是说py
.会不会是你的anaconda版本和spark版本使用了不同版本的python?
关于java - Pyspark 3.1.1 Py4JjavaError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67533296/
我使用的是 Windows 8.1 和 Python 2.7,我在特定文件路径中设置了所有文件(希望正确),但每当我运行 python manage.py runserver 时,我都会收到此错误。
背景: 我有一个像这样的目录结构: Package/ setup.py src/ __init__.py __main__.py cod
我从这个线程运行了一个示例代码。 How to properly use coverage.py in Python? 但是,当我执行此命令时 py.test test.py --cov=sample
IPython 0.13.1 文档说: $ ipython -h ... Usage ipython [subcommand] [options] [files] If invoked
我写了一个网站,让我困惑的是当我运行这个网站时,首先我需要启动应用程序,所以有 3 种方法: sudo python xxx.py python xxx.py xxx.py 每一个我都不清楚怎么用,目
我不确定为什么它不起作用,这可能是一个您无法解决的问题,但我只是想知道为什么它不起作用。如果我浪费了您的时间,或者没有正确地提出问题,我很抱歉,我 16 岁,对 Python 还算陌生。 在main.
鉴于以下情况:models.py from .managers import PersonManager from django.db import models class Person(model
有没有办法将参数传递给 web.py 处理程序类构造函数? 例如。这些参数可能来自命令行(当主 web.py 脚本运行时),在第一个参数(作为端口号)之后 最佳答案 当然,这取决于你的意思。毕竟都是p
我对 python/django 编程很陌生,因为我没有编程背景。我正在在线上课,我只想确切地知道 manage.py 文件的作用。我试过用谷歌搜索它,但除了在 django-admin.py 周围放
我想将类别及其子类别保存到数据库中,这里每个类别都有多个子类别。您能帮我保存与类别相对应的用户、类别和多个子类别吗?Models.py、Serializers.py、Views .py 并附加传入请求
所以我的机器人开始有很多命令,并且在 main.py 上变得有点困惑。我知道有一种方法可以将命令存储在其他文件中,然后在 discord.js 上触发它们时将它们应用于 main.py。在 disco
我正在尝试制作一个类似于 mee6 的 Discord 机器人,因为它会按特定时间间隔计算用户在我的 Discord 服务器中发送的消息。我已经在网上搜索过,但即使有类似的问题也找不到我要找的东西。例
我正在尝试制作一个机器人,它根据特定 channel 中的消息创建线程。如果有在 discord.py 中的文本 channel 中创建线程的方法,请告诉我。 最佳答案 是的,但有一个问题。 当前版本
我一直在尝试制作一个命令来显示一些信息,然后当我对表情使用react时,它应该会显示另一组信息。 我尝试使用 this 的部分内容,特别是第 335 到 393 行的部分让它工作。但是,它什么也不做。
这是我试过的代码: @client.event async def on_message(message): if client.user.mention in message.content
我试过这段代码,机器人说猜但没有回应我的猜测。 @commands.command() async def game(self, ctx): number = random.randint(0
我决定尝试让我的不和谐机器人播放音乐,但我已经卡住了。主要是因为我找不到任何资源来帮助当前版本,我一直在从文档中获取所有内容。但是,我不知道如何检查机器人是否已连接到语音 channel 。 我试过
我在一个目录中有三个文件: # Untitled-1.py print("UTITLEDPY") if __name__== "__main__": from telegram.ext imp
我对 python 相当陌生,并且一直只使用 Jupyter Notebooks。当我需要运行我已保存在计算机中某处的 .py 文件时,我通常所做的就是使用魔术命令 %run %run '/home/
我有 Django 1.4 和 Python 2.6.6当我使用“django-amdin.py startproject djproject”时,请按照网页中的步骤操作 https://www.ib
我是一名优秀的程序员,十分优秀!