gpt4 book ai didi

sql - pyspark.sql.utils.ParseException : "\nmismatched input" in PYSPARKSQL

转载 作者:行者123 更新时间:2023-12-04 17:47:04 25 4
gpt4 key购买 nike

我需要使用 pivot 转换表。我已经在 pyspark sql 中尝试过,但遇到问题。

Store      Week     xCount
------- ---- ------
102 1 96
101 1 138
105 1 37
109 1 59
101 2 282
102 2 212
105 2 78
109 2 97
105 3 60
102 3 123
101 3 220
109 3 87

我希望它以数据透视表的形式出现,如下所示:

Store        1          2          3       .
-----
101 138 282 220
102 96 212 123
105 37
109

我正在使用的代码如下

from pyspark import SparkContext,SparkConf
from builtins import int
#from org.spark.com.PySparkDemo import data
from pyspark.sql import Row
from pyspark.sql.context import SQLContext




conf = SparkConf().setAppName("FileSystem").setMaster("local")
sc=SparkContext(conf=conf)
sqlContext=SQLContext(sc)
a = sc.textFile("C:/Users/ashok.darsi/Desktop/Data1.txt")
b = a.map(lambda x:x.split(",")).map(lambda x :Row(store=int(x[0]),week=int(x[1]),xcount=int(x[2])))
c = sqlContext.createDataFrame(b)
#c.show()

c.registerTempTable("d")

e = sqlContext.sql("select * from d pivot(max(xcount)) for week in ([1],[2],[3])) as pivt")

e.show()

我遇到的错误

select * from d pivot(max(xcount)) for week in ([1],[2],[3])) as pivt
---------------------^^^

pyspark.sql.utils.ParseException: "\nmismatched input '(' expecting {<EOF>, ',', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}(line 1, pos 21)\n\n== SQL ==\nselect * from d pivot(max(xcount)) for week in ([1],[2],[3])) as pivt\n---------------------^^^\n"
SUCCESS: The process with PID 8668 (child process of PID 2192) has been terminated.
SUCCESS: The process with PID 2192 (child process of PID 4940) has been terminated.
SUCCESS: The process with PID 4940 (child process of PID 11192) has been terminated.

最佳答案

我不认为 HiveQL 本身就支持 PIVOT/Spark SQL。

您可以使用 Dataframe 特定功能 pivot在这里代替:

df.groupBy("store").pivot("week").max("xcount").show()

+-----+---+---+---+
|store| 1| 2| 3|
+-----+---+---+---+
| 101|138|282|220|
| 102| 96|212|123|
| 105| 37| 78| 60|
| 109| 59| 97| 87|
+-----+---+---+---+

使用 Spark-SQL,您必须手动指定每个星期的值,然后从中找到一个 max:

select 
store,
max(case when week = 1 then xcount else null end) as `1`,
max(case when week = 2 then xcount else null end) as `2`,
max(case when week = 3 then xcount else null end) as `3`
from
d
group by
store

关于sql - pyspark.sql.utils.ParseException : "\nmismatched input" in PYSPARKSQL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48003957/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com