gpt4 book ai didi

spark-koalas - 在执行简单的 head() 调用时,考拉在
转载 作者:行者123 更新时间:2023-12-04 07:55:00 25 4
gpt4 key购买 nike

当我在 python 脚本中运行以下代码并直接使用 python 运行它时,我收到以下错误。
当我开始一个 pyspark session 然后导入考拉时,数据框的创建和调用 head() 运行良好,并为我提供了预期的输出。
是否有特定的方式需要设置 SparkSession 以使考拉工作?

from pyspark.sql import SparkSession
import pandas as pd
import databricks.koalas as ks


spark = SparkSession.builder \
.master("local[*]") \
.appName("Pycedro Spark Application") \
.getOrCreate()


kdf = ks.DataFrame({"a" : [4 ,5, 6],
"b" : [7, 8, 9],
"c" : [10, 11, 12]})

print(kdf.head())
在 python 脚本中运行时出错:
    File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 586, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 69, in read_command
command = serializer._read_with_length(file)
File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 160, in _read_with_length
return self.loads(obj)
File "/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 430, in loads
return pickle.loads(obj, encoding=encoding)
AttributeError: Can't get attribute '_fill_function' on <module 'pyspark.cloudpickle' from '/usr/local/Cellar/apache-spark/3.1.1/libexec/python/lib/pyspark.zip/pyspark/cloudpickle/__init__.py'>

at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:517)
[...]
版本:
考拉:1.7.0
pyspark:版本:3.0.2

最佳答案

我在 PySpark 上遇到了类似的问题。将 PySpark 从 3.0.2 版本升级到 3.1.2 解决了这个问题。以下是更多信息:

  • Hadoop 版本:3.2.2
  • Spark 版本:3.1.2
  • Python 版本:3.8.5

  • 有趣的是
    df = spark.read.csv("hdfs:///data.csv")
    df.show(2)
    工作得很好,但是
    sc.textFile("hdfs:///data.csv") 
    sc.take(2)
    导致以下错误:
    AttributeError: Can't get attribute '_fill_function' on <module 'pyspark.cloudpickle' from '/opt/spark/python/lib/pyspark.zip/pyspark/cloudpickle/__init__.py'>
    升级 PySpark 解决了这个问题。
    升级的想法来自以下链接:
    https://issues.apache.org/jira/browse/SPARK-29536

    关于spark-koalas - 在执行简单的 head() 调用时,考拉在 <module ' Can' 上抛出 'pyspark.cloudpickle' t get attribute _fill_function',我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66746285/

    25 4 0

    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com