gpt4 book ai didi

python - Spark 2.0 toPandas 方法

转载 作者:太空狗 更新时间:2023-10-30 01:36:59 25 4
gpt4 key购买 nike

我有一个如下所示的 spark 数据框:

topics.show(2)
+-----+--------------------+--------------------+--------------------+
|topic| termIndices| termWeights| topics_words|
+-----+--------------------+--------------------+--------------------+
| 0|[0, 39, 68, 43, 5...|[0.06362107696025...|[, management, sa...|
| 1|[3, 1, 8, 6, 4, 1...|[0.03164821806301...|[objectives, lear...|
+-----+--------------------+--------------------+--------------------+
only showing top 2 rows

但是,当我尝试使用以下适用于 1.6 的方法转换为 pandas 数据框时,出现错误。

topics.toPandas()

---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-165-4c1231b68769> in <module>()
----> 1 topics.toPandas()

/Users/i854319/spark2/python/pyspark/sql/dataframe.pyc in toPandas(self)
1440 """
1441 import pandas as pd
-> 1442 return pd.DataFrame.from_records(self.collect(), columns=self.columns)
1443
1444 ##########################################################################################

/Users/i854319/spark2/python/pyspark/sql/dataframe.pyc in collect(self)
307 [Row(age=2, name=u'Alice'), Row(age=5, name=u'Bob')]
308 """
--> 309 with SCCallSiteSync(self._sc) as css:
310 port = self._jdf.collectToPython()
311 return list(_load_from_socket(port, BatchedSerializer(PickleSerializer())))

/Users/i854319/spark2/python/pyspark/traceback_utils.pyc in __enter__(self)
70 def __enter__(self):
71 if SCCallSiteSync._spark_stack_depth == 0:
---> 72 self._context._jsc.setCallSite(self._call_site)
73 SCCallSiteSync._spark_stack_depth += 1
74

AttributeError: 'NoneType' object has no attribute 'setCallSite'

所以不确定这个方法在Spark 2.0.2中是不是有bug或者哪里出了问题?

最佳答案

正在复制我的 answer from a related question :

有一个 Unresolved 问题:

https://issues.apache.org/jira/browse/SPARK-27335?jql=text%20~%20%22setcallsite%22

张贴者建议强制将 DF 的后端与 Spark 上下文同步:

df.sql_ctx.sparkSession._jsparkSession = spark._jsparkSession
df._sc = spark._sc

这对我们有用,希望也适用于其他情况。

关于python - Spark 2.0 toPandas 方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42290766/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com