gpt4 book ai didi

pandas - 从带有日期的 Spark 数据帧转换为 Pandas 数据帧时出错

转载 作者:行者123 更新时间:2023-12-03 13:33:15 25 4
gpt4 key购买 nike

我有一个带有此架构的 Spark 数据框:

root
|-- product_id: integer (nullable = true)
|-- stock: integer (nullable = true)
|-- start_date: date (nullable = true)
|-- end_date: date (nullable = true)

尝试将其传递给 pandas_udf 时或转换为 Pandas 数据帧:
pandas_df = spark_df.toPandas()

它返回此错误:
AttributeError        Traceback (most recent call last)
<ipython-input-86-4bccc6e8422d> in <module>()
10 # spark_df.printSchema()
11
---> 12 pandas_df = spark_df.toPandas()

/home/.../lib/python2.7/site-packages/pyspark/sql/dataframe.pyc in toPandas(self)
2123 table = pyarrow.Table.from_batches(batches)
2124 pdf = table.to_pandas()
-> 2125 pdf = _check_dataframe_convert_date(pdf, self.schema)
2126 return _check_dataframe_localize_timestamps(pdf, timezone)
2127 else:

/home.../lib/python2.7/site-packages/pyspark/sql/types.pyc in _check_dataframe_convert_date(pdf, schema)
1705 """
1706 for field in schema:
-> 1707 pdf[field.name] = _check_series_convert_date(pdf[field.name], field.dataType)
1708 return pdf
1709

/home/.../lib/python2.7/site-packages/pyspark/sql/types.pyc in _check_series_convert_date(series, data_type)
1690 """
1691 if type(data_type) == DateType:
-> 1692 return series.dt.date
1693 else:
1694 return series

/home/.../lib/python2.7/site-packages/pandas/core/generic.pyc in __getattr__(self, name)
5061 if (name in self._internal_names_set or name in self._metadata or
5062 name in self._accessors):
-> 5063 return object.__getattribute__(self, name)
5064 else:
5065 if self._info_axis._can_hold_identifiers_and_holds_name(name):

/home/.../lib/python2.7/site-packages/pandas/core/accessor.pyc in __get__(self, obj, cls)
169 # we're accessing the attribute of the class, i.e., Dataset.geo
170 return self._accessor
--> 171 accessor_obj = self._accessor(obj)
172 # Replace the property with the accessor object. Inspired by:
173 # http://www.pydanny.com/cached-property.html

/home/.../lib/python2.7/site-packages/pandas/core/indexes/accessors.pyc in __new__(cls, data)
322 pass # we raise an attribute error anyway
323
--> 324 raise AttributeError("Can only use .dt accessor with datetimelike "
325 "values")

AttributeError: Can only use .dt accessor with datetimelike values

如果日期字段从 spark 数据框中删除,则转换工作没有问题。

我检查了数据不包含任何空值,但也很高兴知道如何处理这些空值。

我正在使用 python2.7:
  • pyspark==2.4.0
  • pyarrow==0.12.1
  • Pandas ==0.24.1
  • 最佳答案

    看起来像一个错误。 pyarrow==0.12.1 和 pyarrow==0.12.0 也有同样的问题。将 spark dataframe 列转换为 TIMESTAMP 对我有用。

    spark.sql('SELECT CAST(date_column as TIMESTAMP) FROM foo')

    还回滚到 pyarrow==0.11.0 解决了这个问题。
    (我的 python 是 3.7.1,pandas 是 0.24.2)

    关于pandas - 从带有日期的 Spark 数据帧转换为 Pandas 数据帧时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54905790/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com