python - pyspark : How to apply to a dataframe value another value depending on date in another dataframe-6ren

python - pyspark : How to apply to a dataframe value another value depending on date in another dataframe

转载作者：行者123 更新时间：2023-12-01 07:10:26

我的第一个数据帧 df 包含 start_date 和值，第二个数据帧 df_v 仅包含日期。

我的df:

+-------------------+-----+
|      start_date   |value|
+-------------------+-----+
|2019-03-17 00:00:00|   35|
+-------------------+-----+
|2019-05-20 00:00:00|   40|
+-------------------+-----+
|2019-06-03 00:00:00|   10|
+-------------------+-----+
|2019-07-01 00:00:00|   12|
+-------------------+-----+

我的df_v:

+-------------------+
|       date        |
+-------------------+
|2019-02-01 00:00:00|
+-------------------+
|2019-04-10 00:00:00|
+-------------------+
|2019-06-14 00:00:00|   
+-------------------+

我想要的是新的df_v:

+-------------------+-------------+
|       date        |   v_value   |
+-------------------+-------------+
|2019-02-01 00:00:00|            0|
+-------------------+-------------+
|2019-04-10 00:00:00|    (0+35) 35|
+-------------------+-------------+
|2019-06-14 00:00:00|(35+40+10) 85|
+-------------------+-------------+

尝试像这样工作:

df=df.withColumn("lead",lead(F.col("start_date"),1).over(Window.orderBy("start_date")))

for r_v in df_v.rdd.collect():
    for r in df.rdd.collect():
        if (r_v.date >= r.start_date) and (r_v.date < r.lead):
            df_v = df_v.withColumn('v_value', 
            ...

最佳答案

这可以通过连接和聚合来完成。

from pyspark.sql.functions import sum,when
#Join
joined_df = df_v.join(df,df.start_date <= df_v.date,'left')
joined_df.show() #View the joined result 
#Aggregation
joined_df \
.groupBy(joined_df.date) \
.agg(sum((when(joined_df.value.isNull(),0).otherwise(joined_df.value))).alias('val')) \
.show()

关于python - pyspark : How to apply to a dataframe value another value depending on date in another dataframe，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58248774/

文章推荐： Jquery - 如何选择距离TD最近的Dom元素

文章推荐： java - MySQL blob 到 Netbeans JLabel

文章推荐： jquery - 显示不为空的 DIV

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - pyspark : How to apply to a dataframe value another value depending on date in another dataframe