pandas - 运行时错误 : Unsupported type in conversion to Arrow: VectorUDT-6ren

pandas - 运行时错误 : Unsupported type in conversion to Arrow: VectorUDT

转载作者：行者123 更新时间：2023-12-03 16:52:03

30

4

我想将一个大的 spark 数据框转换为超过 1000000 行的 Pandas。我尝试使用以下代码将 spark 数据帧转换为 Pandas 数据帧:

spark.conf.set("spark.sql.execution.arrow.enabled", "true")
result.toPandas()

但是，我得到了错误:

TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pyspark/sql/dataframe.py in toPandas(self)
   1949                 import pyarrow
-> 1950                 to_arrow_schema(self.schema)
   1951                 tables = self._collectAsArrow()

/usr/local/lib/python3.6/dist-packages/pyspark/sql/types.py in to_arrow_schema(schema)
   1650     fields = [pa.field(field.name, to_arrow_type(field.dataType), nullable=field.nullable)
-> 1651               for field in schema]
   1652     return pa.schema(fields)

/usr/local/lib/python3.6/dist-packages/pyspark/sql/types.py in <listcomp>(.0)
   1650     fields = [pa.field(field.name, to_arrow_type(field.dataType), nullable=field.nullable)
-> 1651               for field in schema]
   1652     return pa.schema(fields)

/usr/local/lib/python3.6/dist-packages/pyspark/sql/types.py in to_arrow_type(dt)
   1641     else:
-> 1642         raise TypeError("Unsupported type in conversion to Arrow: " + str(dt))
   1643     return arrow_type

TypeError: Unsupported type in conversion to Arrow: VectorUDT

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-138-4e12457ff4d5> in <module>()
      1 spark.conf.set("spark.sql.execution.arrow.enabled", "true")
----> 2 result.toPandas()

/usr/local/lib/python3.6/dist-packages/pyspark/sql/dataframe.py in toPandas(self)
   1962                     "'spark.sql.execution.arrow.enabled' is set to true. Please set it to false "
   1963                     "to disable this.")
-> 1964                 raise RuntimeError("%s\n%s" % (_exception_message(e), msg))
   1965         else:
   1966             pdf = pd.DataFrame.from_records(self.collect(), columns=self.columns)

RuntimeError: Unsupported type in conversion to Arrow: VectorUDT
Note: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true. Please set it to false to disable this.

它不起作用，但如果我将箭头设置为 false，它就起作用了。但它太慢了......知道吗？

最佳答案

Arrow 只支持一小组类型，Spark UserDefinedTypes ，包括 ml和 mllib VectorUDTs不在支持的范围内。

如果您想使用箭头，则必须将数据转换为支持的格式。一种可能的解决方案是扩展 Vectors成列 - How to split Vector into columns - using PySpark

您还可以使用 to_json 序列化输出方法:

from pyspark.sql.functions import to_json

 df.withColumn("your_vector_column", to_json("your_vector_column"))

但如果数据足够大 toPandas成为一个严重的瓶颈，那么我会重新考虑收集这样的数据。

关于pandas - 运行时错误 : Unsupported type in conversion to Arrow: VectorUDT，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51175500/

30

4

0

文章推荐： csv - 读取缺少列和随机列顺序的csv文件

文章推荐： jsonschema - 覆盖继承的 json 模式

文章推荐： reactjs - React - 如何调整视频 react 播放器的大小？

watson-conversation - Watson Conversation 支持嵌套的意图吗？
有没有人有嵌套 Intent 的好例子，尤其是在 #yes 和 #no 是子节点的情况下。我得到的情况是 API 返回的是 Intent 值，但输出文本来自“Anything else”! 最佳答案
watson-conversation - 是否有一种编程方式可以导出 Watson Conversation 的意图？
我知道您可以转到 Watson Conversation 界面，右键单击工作区，然后下载工作区的 JSON，其中包含意图，如下所示:Is there any way to export intents
watson-conversation - 在为 Watson Conversation API 流创建流时，我可以使用意图的置信度评级吗？
我能否在 Watson Conversation API 的对话流中使用节点条件中的意图置信度评级？最佳答案要做到这一点，请创建一个条件来寻找您的意图，然后检查置信度。你会拥有的示例条件 #te
c++ - 错误 83 错误 C2398 : conversion from 'double' to 'float' requires a narrowing conversion
我找到了很多关于这个错误的帖子，但我可以找到克服它的方法。这是触发错误的代码: void main(){ float f{1.3}; } 为什么在初始化列表中没有像其他变量那样发生转换？例如，
c++ - Derived pointer to Base pointer conversion using static_cast, dynamic_cast, or explicit conversion 不会调用基函数
我有以下代码。 #include using namespace std; class Base { public: virtual int f(){coutf(); ///base
c++ - 不一致的警告 "conversion from ' const unsigned char' to 'const float' requires a narrowing conversion”
Visual C++ 2017 和 gcc 5.4 产生 conversion from 'const unsigned char' to 'const float' requires a narro
c - Microchip XC8 警告 "conversion to shorter data type"和 "implicit conversion signed to unsigned"
我正在为 PIC18F2420 使用带有 xc8 1.35 编译器的 MPLABX 3.20，我收到了两个我不理解的奇怪警告: 这是生成警告的源代码之一 9 void write(Pin _Pin,
c++ - 在 visual studio 2015 community c++ 中，如何修复警告 C4838 : conversion from 'unsigned int' to 'int' requires a narrowing conversion
我正在尝试在 win32 API(无 mfc)上编写一些直接的 c++。有了这个更现代的 c++ 编译器，我得到: 警告 C4838:从“unsigned int”到“int”的转换需要收缩转换它发
编译错误 'incompatible pointer to integer conversion passing ' int (int, int )' to parameter of type ' int' [-Wint-conversion]"
此代码采用用户输入(字符 C、T、B)和(int 0-24 和 0-60)来计算 parking 成本关于用户输入的车辆类型。我怀疑错误发生在函数 charged 中，因此我无法在收到此错误的最后一
type-conversion - 元组与元组
为什么在使用 tuple 或 Tuple 转换向量时会得到以下不同的结果？ julia> a = [1, 2, 3] 3-element Vector{Int64}: 1 2 3 julia> tup
code-conversion - 如何管理代码转换？
关闭。这个问题是opinion-based .它目前不接受答案。想改善这个问题吗？更新问题，以便可以通过 editing this post 用事实和引文回答问题. 8年前关闭。 Improve t
document-conversion - 如何预览文件？
我正在开发一个文件共享网站，我需要一种方法来对上传的文档进行截图。该站点将支持多种文件格式，从纯文本到办公文档(doc、xls、ppt...)、视频(mpeg、avi...)、图像(jpg、gif、
type-conversion - 整数到实数转换函数
在 VHDL 中是否有将整数类型对象转换为实数类型的通用转换函数？这是针对测试平台的，因此可综合性不是问题。最佳答案您可以将整数转换为实数，如下所示: signal i: integer; si
type-conversion - 字符串选项到字符串转换
如何在 Ocaml 中将字符串选项数据类型转换为字符串？ let function1 data = match data with None -> "" | Some str -> s
type-conversion - SRA不能有这样的操作数吗？
我已经在 VHDL 中编写了一个算法，但是我有一条消息，我不明白“sra/sla 在这种情况下不能有这样的操作数。”。请问有什么帮助吗？ library ieee; use ieee.std_logi
type-conversion - 比较任意两项的最有效运算符是什么？
我经常需要将数据从一种类型转换为另一种类型，然后进行比较。一些运算符会先转换为特定类型，这种转换可能会导致效率损失。例如，我可能有 my $a, $b = 0, "foo"; # initial va
watson-conversation - 沃森没有得到零
假设我在 IBM Watson 中配置了一个对话服务，可以识别以单词和片段形式给出的数字。例如，如果我有号码 1320 , 可以发送为 thirteen twenty或 thirteen two ze
type-conversion - 转换容器类型？
也许我错过了一些显而易见的事情...在整个文档中，我似乎都认为Kotlin具有各种类型的序列，这些序列不能互操作。即使复制序列可能效率不高–当我需要将其作为语义相同但不同的类型传递给函数时，这也无济于
c++ - QTcpSocket模拟netcat 'conversation'
在我的Linux终端中，我想要使用QTcpSocket从qt运行以下“对话”: S user@domain:~ $ netcat 1.1.1.2 9230 R HELO SOME MORE I
c++ - "Conversion"从类型到相同类型导致错误
我有一个模板函数，其中枚举类型转换为它的底层类型，工作正常，但我写了一个重载，它应该接受一个整数并返回它自己，它给我一个错误，指出 int 不是枚举类型。在我的模板中，这应该已经被过滤掉了。怎么了？

首页

博学

6Ren·AI

商城

pandas - 运行时错误 : Unsupported type in conversion to Arrow: VectorUDT