gpt4 book ai didi

pyspark:两个日期列之间的小时差异

转载 作者:行者123 更新时间:2023-12-04 02:00:49 26 4
gpt4 key购买 nike

我想计算 pyspark 中两个日期列之间的小时数。
只能找到如何计算日期之间的天数。

dfs_4.show()


+--------------------+--------------------+
| request_time| max_time|
+--------------------+--------------------+
|2017-11-17 00:18:...|2017-11-20 23:59:...|
|2017-11-17 00:07:...|2017-11-20 23:59:...|
|2017-11-17 00:35:...|2017-11-20 23:59:...|
|2017-11-17 00:10:...|2017-11-20 23:59:...|
|2017-11-17 00:03:...|2017-11-20 23:59:...|
|2017-11-17 00:45:...|2017-11-20 23:59:...|
|2017-11-17 00:35:...|2017-11-20 23:59:...|
|2017-11-17 00:59:...|2017-11-20 23:59:...|
|2017-11-17 00:28:...|2017-11-20 23:59:...|
|2017-11-17 00:11:...|2017-11-20 23:59:...|
|2017-11-17 00:13:...|2017-11-20 23:59:...|
|2017-11-17 00:42:...|2017-11-20 23:59:...|
|2017-11-17 00:07:...|2017-11-20 23:59:...|
|2017-11-17 00:40:...|2017-11-20 23:59:...|
|2017-11-17 00:15:...|2017-11-20 23:59:...|
|2017-11-17 00:05:...|2017-11-20 23:59:...|
|2017-11-17 00:50:...|2017-11-20 23:59:...|
|2017-11-17 00:40:...|2017-11-20 23:59:...|
|2017-11-17 00:25:...|2017-11-20 23:59:...|
|2017-11-17 00:35:...|2017-11-20 23:59:...|
+--------------------+--------------------+

天数的计算:
from pyspark.sql import functions as F
dfs_5 = dfs_4.withColumn('date_diff', F.datediff(F.to_date(dfs_4.max_time), F.to_date(dfs_4.request_time)))

dfs_5.show()

+--------------------+--------------------+---------+
| request_time| max_time|date_diff|
+--------------------+--------------------+---------+
|2017-11-17 00:18:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:07:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:35:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:10:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:03:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:45:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:35:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:59:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:28:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:11:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:13:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:42:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:07:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:40:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:15:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:05:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:50:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:40:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:25:...|2017-11-20 23:59:...| 3|
|2017-11-17 00:35:...|2017-11-20 23:59:...| 3|
+--------------------+--------------------+---------+

我怎么能在几个小时内做同样的事情?
谢谢你的帮助

最佳答案

您可以使用 hour从日期时间字段中提取小时,然后简单地将它们减去到一个新列。现在有一种情况,时差超过一天,您需要在两者之间添加一整天。所以我会像你一样创建列 days _diff 然后试试这个:

from pyspark.sql import functions as F

dfs_5 = dfs_4.withColumn('hours_diff', (dfs_4.date_diff*24) +
F.hour(dfs_4.max_time) - F.hour(dfs_4.request_time))

关于pyspark:两个日期列之间的小时差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47419601/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com