PicklingError: Could not serialize object: IndexError: tuple index out of range(PicklingError：Could not serialize object：IndexError：元组索引超出范围)-6ren

PicklingError: Could not serialize object: IndexError: tuple index out of range(PicklingError：Could not serialize object：IndexError：元组索引超出范围)

转载作者：bug小助手更新时间：2023-10-28 10:38:46

I initiated pyspark in cmd and performed below to sharpen my skills.

我在cmd中发起了焰火，并在下面进行了表演，以提高我的技能。

C:\Users\Administrator>SUCCESS: The process with PID 5328 (child process of PID 4476) has been terminated.
SUCCESS: The process with PID 4476 (child process of PID 1092) has been terminated.
SUCCESS: The process with PID 1092 (child process of PID 3952) has been terminated.
pyspark
Python 3.11.1 (tags/v3.11.1:a7a450f, Dec  6 2022, 19:58:39) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/08 20:07:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.3.1
      /_/

Using Python version 3.11.1 (tags/v3.11.1:a7a450f, Dec  6 2022 19:58:39)
Spark context Web UI available at http://Mohit:4040
Spark context available as 'sc' (master = local[*], app id = local-1673188677388).
SparkSession available as 'spark'.
>>> 23/01/08 20:08:10 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
a = sc.parallelize([1,2,3,4,5,6,7,8,9,10])

When I execute a.take(1), I get "_pickle.PicklingError: Could not serialize object: IndexError: tuple index out of range" error and I am unable to find why. When same is run on google colab, it doesn't throw any error. Below is what I get in console.

当我执行a.take(1)时，我得到“_ickle.PicklingError：无法序列化对象：IndexError：tuple index out of range”错误，我找不到原因。当在Google CoLab上运行相同的程序时，它不会抛出任何错误。下面是我在控制台中得到的信息。

>>> a.take(1)
Traceback (most recent call last):
  File "C:\Spark\python\pyspark\serializers.py", line 458, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle_fast.py", line 602, in dump
    return Pickler.dump(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle_fast.py", line 692, in reducer_override
    return self._function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle_fast.py", line 565, in _function_reduce
    return self._dynamic_function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle_fast.py", line 546, in _dynamic_function_reduce
    state = _function_getstate(func)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle_fast.py", line 157, in _function_getstate
    f_globals_ref = _extract_code_globals(func.__code__)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle.py", line 334, in _extract_code_globals
    out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle.py", line 334, in <dictcomp>
    out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
                 ~~~~~^^^^^^^
IndexError: tuple index out of range
Traceback (most recent call last):
  File "C:\Spark\python\pyspark\serializers.py", line 458, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle_fast.py", line 602, in dump
    return Pickler.dump(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle_fast.py", line 692, in reducer_override
    return self._function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle_fast.py", line 565, in _function_reduce
    return self._dynamic_function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle_fast.py", line 546, in _dynamic_function_reduce
    state = _function_getstate(func)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle_fast.py", line 157, in _function_getstate
    f_globals_ref = _extract_code_globals(func.__code__)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle.py", line 334, in _extract_code_globals
    out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\cloudpickle\cloudpickle.py", line 334, in <dictcomp>
    out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
                 ~~~~~^^^^^^^
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Spark\python\pyspark\rdd.py", line 1883, in take
    res = self.context.runJob(self, takeUpToNumLeft, p)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\context.py", line 1486, in runJob
    sock_info = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)
                                                           ^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\rdd.py", line 3505, in _jrdd
    wrapped_func = _wrap_function(
                   ^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\rdd.py", line 3362, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\rdd.py", line 3345, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
                      ^^^^^^^^^^^^^^^^^^
  File "C:\Spark\python\pyspark\serializers.py", line 468, in dumps
    raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: IndexError: tuple index out of range

It should provide [1] as an answer but instead throws this error. Is it because of incorrect installation?

它应该提供[1]作为答案，但却抛出了这个错误。是因为安装不正确吗？

Package used - spark-3.3.1-bin-hadoop3.tgz, Java(TM) SE Runtime Environment (build 1.8.0_351-b10), Python 3.11.1

Package Used-spark-3.3.1-bin-hadoop3.tgz、Java(TM)SE Runtime Environment(内部版本号1.8.0_351-b10)、Python3.11.1

Can anyone help in troubleshooting this? Many thanks in advance.

有谁能帮忙解决这个问题吗？在此之前，我非常感谢您。

更多回答

Might be python version incompatible issue, can you recheck with version 3.8?

可能是PYTHON版本不兼容的问题，你能重新检查3.8版吗？

I tried with Python 3.8.5 and now it shows a different error which Java IO Exception though I pip installed py4j with JDK already installed.

我尝试使用Python 3.8.5，现在它显示了一个不同的错误，Java IO异常，尽管我已经安装了JDK，但我安装了py4j。

I fixed downgrading to Python 3.9, then I installed pip in the version 3.9 doing python3.9 -m ensurepip and then you can use with python3.9 -m pip install pyspark. after that you will get an error which says you are running pyspark 3.9 with python 3.11.... it's a environment variable problem, you have to change two variables:

我修复了降级到Python3.9的问题，然后我在版本3.9中安装了pip，执行的是python3.9-m ensurepip，然后您可以使用python3.9-m的pip安装pyspark。之后，您将收到一条错误消息，提示您正在使用python3.11运行pyspark 3.9。这是一个环境变量问题，您必须更改两个变量：

I use jupyter lab in vscode so in order to have the right variables in vs code jupyterlab you have to open jupyter lab extension settings.json and put "jupyter.runStartupCommands": [ "import os\nos.environ['PYSPARK_PYTHON']='/bin/python3.9'\nos.environ['PYSPARK_DRIVER_PYTHON']='/bin/python3.9/'\n" ]

我在vscode中使用jupyter Lab，因此为了在VS代码jupyterLab中拥有正确的变量，您必须打开Jupyter Lab扩展设置.json并将“jupyter.runStartupCommands”：[“IMPORT os\nos.environ[‘PYSPARK_PYTHON’]=‘/bin/python3.9’\nos.environ[‘PYSPARK_DRIVER_PYTHON’]=‘/bin/python3.9/’\n”]

if you want to use pyspark with python 3.9 in all the system instead, you can add in .bashrc export PYSPARK_PYTHON='/bin/python3.9' and export PYSPARK_DRIVER_PYTHON='/bin/python3.9'

如果你想在所有系统中使用pyspark和python 3.9，你可以添加.bashrc export PYSPARK_PYTHON ='/bin/python3.9'和export PYSPARK_DRIVER_PYTHON ='/bin/python3.9'

优秀答案推荐

According to https://github.com/apache/spark/pull/38987 you will need Spark 3.4.0 to use Python 3.11, at the time of writing not yet released at https://spark.apache.org/downloads.html. Python 3.10 should work.

根据https://github.com/apache/spark/pull/38987的说法，你需要Spark 3.4.0才能使用Python3.11，在撰写本文时还没有在https://spark.apache.org/downloads.html.上发布Python3.10应该可以运行。

As of 3/2/23, I had the same identical problem, and as indicated above, I uninstalled python 3.11 and installed version 3.10.9 and it's solved!

在3/2/23，我遇到了相同的问题，如上所述，我卸载了python3.11并安装了3.10.9版本，问题就解决了！

更多回答

I even tried with Python 3.8.5, but same error persists. I ran standalone Spark in cmd and it works without a flaw and gives the correct output. I am running 2 versions of Python i.e. 3.8.5 and 3.11.1 with 3.8.5 set as default. It throws same error as it did. Any corrections to follow?

我甚至尝试了使用Python3.8.5，但同样的错误仍然存在。我在cmd中运行了独立的Spark，它的工作没有任何缺陷，并且给出了正确的输出。我正在运行2个版本的Python，即3.8.5和3.11.1，默认设置为3.8.5。它抛出的错误与它所做的相同。接下来还有什么需要更正的地方吗？

@MohitAswani To ensure you are absolutely not using Python 3.11 I would uninstall it completely and then see what happens, as you could have other environment variables or configuration in Spark that still point to 3.11 instead of 3.8.

@MohitAswani为了确保您绝对不使用Python3.11，我会完全卸载它，然后看看会发生什么，因为您可能在Spark中有其他环境变量或配置仍然指向3.11而不是3.8。

I have just installed Python 3.10.11 in addition to 3.11.x and used 3.10 as the base interpreter for the venv with PySpark 3.4 and Spark 3.4.0 server. It works seamlessly too.

我刚刚在3.11.x的基础上安装了Python3.10.11，并使用3.10作为venv与PySpark 3.4和Spark 3.4.0服务器的基本解释器。它也可以无缝工作。

文章推荐： Eclipse - 没有 Java (JRE)/(JDK) ...没有虚拟机

文章推荐： java - IntelliJ IDEA 生成 serialVersionUID

文章推荐： java - Java中的可序列化和可外部化有什么区别？

文章推荐： java - 如何制作对象的深拷贝？

PicklingError: Could not serialize object: IndexError: tuple index out of range(PicklingError：Could not serialize object：IndexError：元组索引超出范围)
我在cmd中发起了焰火，并在下面进行了表演，以提高我的技能。。当我执行a.take(1)时，我得到“_ickle.PicklingError：无法序列化对象：IndexError：tuple inde
Python:IndexError:列出超出范围的索引和类对象
关闭。这个问题需要details or clarity .它目前不接受答案。想改进这个问题？通过 editing this post 添加详细信息并澄清问题. 1年前关闭。 Improve this
python - 条件语句上的Python IndexError
如果我测试以下“空”条件，则会得到一个IndexError，指出字符串索引超出范围。这是为什么？如果用户输入为空，我希望脚本打印“空”。 pyg = 'ay' original = raw_input
python - 为什么我的零列表会导致 IndexError？
我在 occurence[j] = 0 上收到错误消息.我真的不明白我的代码中这个错误的起源，因为它的长度是 dna，因为我在代码的顶部附加了 len(dna)零，然后我将一些值分配给同一个列表 oc
python - IndexError:索引1080超出了尺寸为1080的轴0的范围
我正在阅读python上的视频帧，并且试图找到每个帧索引的RGB。我需要检测LED(将阈值设置为开/关-红色/黑色)，但是我遇到了索引编制问题。我需要访问图像左下角的RGB值。 # Check if
python - 为什么按索引弹出或删除时会出现 IndexError ？
你能告诉我为什么我必须在 try/except 语句中包含这个 if 吗？当我使用列表调用函数时，此方法会引发 IndexError ，其中最后一个元素与列表中的其他元素相同。当最后一个元素在列表中仅
Python IndexError - 需要帮助排序键和值
我需要帮助对我的键值对进行排序。我的输出位于此网址 http://pastebin.com/ckKAtP5y 中。然而，我一直在努力做的是。 { "courses": [ {
python - 检查所有列表元素是否相同的代码返回 IndexError？
我知道索引会超过该组最后一个成员的索引，但如何使该功能正常工作？无论如何，该函数的目的是检查列表中的所有成员是否相同。我对另一个函数有同样的问题，该函数旨在检查列表是否按顺序排列。它适用于其他所有内
python - 创建平方列表时出现 IndexError
请考虑以下代码吗？ start_list = [5, 3, 1, 2, 4] square_list = [] for i in start_list: square_list.append(
python - 为什么会抛出 IndexError？
我正在尝试找出它抛出该错误的原因。元组的长度应该是 4，确实是。有什么提示、想法吗？此代码适用于我正在为 Udacity 编程入门 nano 学位开展的瑞士风格项目。相关代码Python代码来自t
Python:IndexError:列表索引超出范围错误
已更新，看底! 我卡住了!我收到一个 IndexError: list index out of range 错误。 def makeInverseIndex(strlist): numStr
python - 切片数组的 IndexError
我想我是在问一个新手问题，但今天花了太多时间寻找答案。在使用 np.genfromtxt 保存和重新加载后天真地尝试对 numpy 数组执行相同的切片操作时，我收到了 IndexError: too
Python:IndexError:列表索引超出范围仅在某些情况下发生
我一直在做一个井字游戏的程序，需要两个玩家轮流输入棋盘坐标，比如 (r1,c1)->(r2,c2)-> (r3,c3)-> …，其中 r 是行，c 是列，棋盘看起来像 0 1 2 3 4 5 6 7
python - 如何避免 IndexError
一些背景知识: 我正在编写一个基于单词的小型迷宫游戏，从一个由 X、O 和 F 组成的简单迷宫开始。我的迷宫是代表迷宫本身的列表列表，其中 X 是一堵墙，O 是一个开放点，F 是终点线。我正在尝试
python - IndexError 字符串索引超出范围
s="(8+(2+4))" def checker(n): if len(n) == 0: return True if n[0].isdigit==True: if n[1].isd
python - 为什么这个切片会抛出 IndexError？
我在图表上搜索单词的代码如下: size = len(word) height = len(diagram) width = len(diagram[0]) for i in range(0, hei
Python:分配变量并忽略 'IndexError'
我有 Python 代码在列表中搜索字符串并将相应的参数存储到变量。 “X”下方的列表由单引号中的逗号分隔，这是我的实际逻辑 map1 = str(val).split('MAP:')[1].spli
python - 删除列表项时出现意外的 IndexError
这个问题在这里已经有了答案: How to remove items from a list while iterating? (25 个答案) 关闭 6 年前。我是 Python 初学者。之前学
Python:For 循环列表给出 IndexError
a = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89] for i in a: print(a[i]) IndexError: list index out of
python - python中的合并排序 IndexError
在我使用 python 实现的合并排序中，运行时发生错误 IndexError: list assignment index out of range 这是代码: #merge def merge(a

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

PicklingError: Could not serialize object: IndexError: tuple index out of range(PicklingError：Could not serialize object：IndexError：元组索引超出范围)