gpt4 book ai didi

python - 有没有任何函数可以帮助我在 PySpark 中转换日期和字符串格式

转载 作者:太空宇宙 更新时间:2023-11-03 20:56:09 26 4
gpt4 key购买 nike

目前我在Pyspark工作,对这项技术了解甚少。我的数据框如下所示:

id       dob            var1
1 13-02-1976 aab@dfsfs
2 01-04-2000 bb@NAm
3 28-11-1979 adam11@kjfd
4 30-01-1955 rehan42@ggg

我的输出如下:

id       dob            var1             age           var2
1 13-02-1976 aab@dfsfs 43 aab
2 01-04-2000 bb@NAm 19 bb
3 28-11-1979 adam11@kjfd 39 adam11
4 30-01-1955 rehan42@ggg 64 rehan42

到目前为止我做了什么 -

df= df.select( df.id.cast('int').alias('id'),                                      
df.dob.cast('date').alias('dob'),
df.var1.cast('string').alias('var1'))

但我认为 dob 未正确转换。

df= df.withColumn('age', F.datediff(F.current_date(), df.dob))

最佳答案

正如您所说,dob 列的类型转换不正确。请尝试这个。

from pyspark.sql.functions import col, unix_timestamp, to_date
import pyspark.sql.functions as F

df2 = df.withColumn('date_in_dateFormat',to_date(unix_timestamp(F.col('dob'),'dd-MM-
yyyy').cast("timestamp")))
df2.show()
+---+----------+-----------+------------------+
| id| dob| var1|date_in_dateFormat|
+---+----------+-----------+------------------+
| 1|13-02-1976| aab@dfsfs| 1976-02-13|
| 2|01-04-2000| bb@NAm| 2000-04-01|
| 3|28-11-1979|adam11@kjfd| 1979-11-28|
| 4|30-01-1955|rehan42@ggg| 1955-01-30|
+---+----------+-----------+------------------+

df2.printSchema()
root
|-- id: integer (nullable = true)
|-- dob: string (nullable = true)
|-- var1: string (nullable = true)
|-- date_in_dateFormat: date (nullable = true)

df3= df2.withColumn('age', F.datediff(F.current_date(), df2.date_in_dateFormat))
df3.show()
+---+----------+-----------+------------------+-----+
| id| dob| var1|date_in_dateFormat| age|
+---+----------+-----------+------------------+-----+
| 1|13-02-1976| aab@dfsfs| 1976-02-13|15789|
| 2|01-04-2000| bb@NAm| 2000-04-01| 6975|
| 3|28-11-1979|adam11@kjfd| 1979-11-28|14405|
| 4|30-01-1955|rehan42@ggg| 1955-01-30|23473|
+---+----------+-----------+------------------+-----+

split_col =F.split(df['var1'], '@')
df4=df3.withColumn('Var2', split_col.getItem(0))
df4.show()
+---+----------+-----------+------------------+-----+-------+
| id| dob| var1|date_in_dateFormat| age| Var2|
+---+----------+-----------+------------------+-----+-------+
| 1|13-02-1976| aab@dfsfs| 1976-02-13|15789| aab|
| 2|01-04-2000| bb@NAm| 2000-04-01| 6975| bb|
| 3|28-11-1979|adam11@kjfd| 1979-11-28|14405| adam11|
| 4|30-01-1955|rehan42@ggg| 1955-01-30|23473|rehan42|
+---+----------+-----------+------------------+-----+-------+

关于python - 有没有任何函数可以帮助我在 PySpark 中转换日期和字符串格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56016134/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com