gpt4 book ai didi

python - PySpark SQL 中的 LEFT 和 RIGHT 函数

转载 作者:行者123 更新时间:2023-11-28 21:08:05 24 4
gpt4 key购买 nike

我是 PySpark 的新手。我使用 pandas 提取了一个 csv 文件。并使用 registerTempTable 函数创建了一个临时表。

from pyspark.sql import SQLContext
from pyspark.sql import Row
import pandas as pd
sqlc = SQLContext(sc)

aa1 = pd.read_csv("D:\mck1.csv")

aa2 = sqlc.createDataFrame(aa1)

aa2.show()

+--------+-------+----------+------------+---------+------------+-------------------+
| City| id|First_Name|Phone_Number|new_date|new code| New_date|
+--------+-------+----------+------------+---------+------------+-------------------+
|KOLKATTA|9000007| AAA| 1111119411| 20080714| 13|2016-08-16 00:00:00|
|KOLKATTA|9000007| BBB| 1111119421| 20080714| 13|2016-08-06 00:00:00|
|KOLKATTA|9000007| CCC| 1111119461| 20080714| 13|2016-08-13 00:00:00|
|KOLKATTA|9000007| DDD| 1111119471| 20080714| 13|2016-08-27 00:00:00|
|KOLKATTA|9000007| EEE| 1111119491| 20080714| 13|2016-08-15 00:00:00|
|KOLKATTA|9111147| FFF| 1111119401| 20080714| 13|2016-08-24 00:00:00|
|KOLKATTA|9585458| FORMULA| 1111110112| 19990930| 13|2016-08-16 00:00:00|
|KOLKATTA|9569878| APPLEII| 1111110132| 19990930| 13|2016-08-06 00:00:00|

aa3 = aa2.registerTempTable("mytable1")

sqlc.sql(""" select right(phone_number,4) from mytable1 """).show()

现在我尝试使用 right(phone_number,4) 使用电话号码的右侧来拉出最后四个字符并面临后续错误

---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-18-07f08e3d0a8f> in <module>()
----> 1 sqlc.sql(""" select right(Phone_number,4) from mytable1 """).show()

C:\spark-1.4.1-bin-hadoop2.6\python\pyspark\sql\context.pyc in sql(self, sqlQuery)
500 [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]
501 """
--> 502 return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
503
504 @since(1.0)

C:\spark-1.4.1-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py in __call__(self, *args)
536 answer = self.gateway_client.send_command(command)
537 return_value = get_return_value(answer, self.gateway_client,
--> 538 self.target_id, self.name)
539
540 for temp_arg in temp_args:

C:\spark-1.4.1-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
298 raise Py4JJavaError(
299 'An error occurred while calling {0}{1}{2}.\n'.
--> 300 format(target_id, '.', name), value)
301 else:
302 raise Py4JError(

Py4JJavaError: An error occurred while calling o55.sql.
: java.lang.RuntimeException: [1.9] failure: ``union'' expected but `right' found

select right(Phone_number,4) from mytable1
^
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)
at org.apache.spark.sql.catalyst.DefaultParserDialect.parse(ParserDialect.scala:67)
at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:145)

为什么 pyspark 不支持 RIGHT 和 LEFT 功能?如何取一列的四个字符?

最佳答案

看着documentation ,你试过substring函数吗?

pyspark.sql.functions.substring(str, pos, len)[source]

编辑

根据你的评论,你可以像这样得到最后四个:

from pyspark.sql.functions import substring

df = sqlContext.createDataFrame([('abcdefg',)], ['s',])
df.select(substring(df.s, -4, 4).alias('s')).collect()

关于python - PySpark SQL 中的 LEFT 和 RIGHT 函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40548878/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com