gpt4 book ai didi

python - 在 PySpark 上将日期时间转换为日期

转载 作者:行者123 更新时间:2023-12-05 02:45:38 26 4
gpt4 key购买 nike

我有一个包含两列 "date"(dtype: string)"modified"(dtype: bigint) 的数据框,如下所示:

+-------------------------------------+-------------+
| date| modified|
+-------------------------------------+-------------+
|Mon, 18 Dec 2017 22:52:37 +0000 (UTC)|1513637587000|
| Mon, 18 Dec 2017 22:52:23 +0000|1513637587000|
| Mon, 18 Dec 2017 22:52:03 +0000|1513637587000|
|Mon, 18 Dec 2017 22:51:43 +0000 (UTC)|1513637527000|
| Mon, 18 Dec 2017 22:51:31 +0000|1513637527000|
| Mon, 18 Dec 2017 22:51:38 +0000|1513637527000|
| Mon, 18 Dec 2017 22:51:09 +0000|1513637526000|
| Mon, 18 Dec 2017 22:50:55 +0000|1513637466000|
| Mon, 18 Dec 2017 22:50:35 +0000|1513637466000|
| Mon, 18 Dec 2017 17:49:35 -0500|1513637407000|
+-------------------------------------+-------------+

如何从任意两列中提取YYYY-mm-dd (2017-12-18)?我尝试使用 unix_timestampto_timestamp 但没有任何效果。它给出 null 值。

最佳答案

可以使用from_unixtime将bigint unix timestamp转为timestamp类型,再转为date类型:

import pyspark.sql.functions as F

df2 = df.withColumn('parsed_date', F.from_unixtime(F.col('modified')/1000).cast('date'))

df2.show()
+--------------------+-------------+-----------+
| date| modified|parsed_date|
+--------------------+-------------+-----------+
|Mon, 18 Dec 2017 ...|1513637587000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637587000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637587000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637527000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637527000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637527000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637526000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637466000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637466000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637407000| 2017-12-18|
+--------------------+-------------+-----------+

关于python - 在 PySpark 上将日期时间转换为日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65864509/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com