gpt4 book ai didi

apache-spark - 将 rdd 转换为数据帧 : AttributeError: 'RDD' object has no attribute 'toDF'

转载 作者:行者123 更新时间:2023-12-04 04:44:39 25 4
gpt4 key购买 nike

这个问题在这里已经有了答案:





'PipelinedRDD' object has no attribute 'toDF' in PySpark

(2 个回答)


3年前关闭。



from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext

conf = SparkConf().setAppName("myApp").setMaster("local")
sc = SparkContext(conf=conf)

a = sc.parallelize([[1, "a"], [2, "b"], [3, "c"], [4, "d"], [5, "e"]]).toDF(['ind', "state"])

a.show()

结果是:
Traceback (most recent call last):
File "/Users/ktemlyakov/messing_around/SparkStuff/mock_maersk_data.py", line 7, in <module>
a = sc.parallelize([[1, "a"], [2, "b"], [3, "c"], [4, "d"], [5, "e"]]).toDF(['ind', "state"])
AttributeError: 'RDD' object has no attribute 'toDF'

我错过了什么?

最佳答案

sqlContext不见了;它需要被创建。以下代码有效:

from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
from pyspark import sql

conf = SparkConf().setAppName("myFirstApp").setMaster("local")
sc = SparkContext(conf=conf)
sqlContext = sql.SQLContext(sc)

a = sc.parallelize([[1, "a"], [2, "b"], [3, "c"], [4, "d"], [5, "e"]]).toDF(['ind', "state"])

a.show()

编辑:

在 Spark 2.0 中,以上可以通过以下方式实现:
from pyspark import SparkConf
from pyspark.sql import SparkSession

spark = SparkSession.builder.master("local").config(conf=SparkConf()).getOrCreate()

a = spark.createDataFrame([[1, "a"], [2, "b"], [3, "c"], [4, "d"], [5, "e"]], ['ind', "state"])
a.show()

关于apache-spark - 将 rdd 转换为数据帧 : AttributeError: 'RDD' object has no attribute 'toDF' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47341048/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com