gpt4 book ai didi

apache-spark - 将PySpark总数分组为几周

转载 作者:行者123 更新时间:2023-12-03 16:13:03 26 4
gpt4 key购买 nike

我最近在类似查询DATE_ADD or DATE_DIFF error when grouping dates in BigQuery方面获得了帮助,但我想知道如何在PySpark中做到这一点,因为我还很新

day         bitcoin_total   dash_total
2009-01-03 1 0
2009-01-09 14 0
2009-01-10 61 0

理想的结果将是星期初的日期(可以是星期一或星期日,以任何一个为准)
day         bitcoin_total   dash_total
2008-12-28 1 0
2009-01-04 75 0

以下代码按数字返回了几周,总数似乎已关闭。我似乎无法复制.agg(sum())返回的总数,甚至无法添加第二个总数(dash_total)。我尝试过 .col("dash_total")是否可以将几天分成几周?
from pyspark.sql.functions import weekofyear, sum

(df
.groupBy(weekofyear("day").alias("date_by_week"))
.agg(sum("bitcoin_total"))
.orderBy("date_by_week")
.show())

我在Databricks上运行Spark。

最佳答案

使用Spark中的 date_sub,next_day 函数尝试这种方法。

说明:

date_sub(
next_day(col("day"),"sunday"), //get next sunday date
7)) //substract week from the date

例子:

In pyspark:
from pyspark.sql.functions import *
df = sc.parallelize([("2009-01-03","1","0"),("2009-01-09","14","0"),("2009-01-10","61","0")]).toDF(["day","bitcoin_total","dash_total"])
df.withColumn("week_strt_day",date_sub(next_day(col("day"),"sunday"),7)).groupBy("week_strt_day").agg(sum("bitcoin_total").cast("int").alias("bitcoin_total"),sum("dash_total").cast("int").alias("dash_total")).orderBy("week_strt_day").show()

Result:
+-------------+-------------+----------+
|week_strt_day|bitcoin_total|dash_total|
+-------------+-------------+----------+
| 2008-12-28| 1| 0|
| 2009-01-04| 75| 0|
+-------------+-------------+----------+

In scala:
import org.apache.spark.sql.functions._
val df=Seq(("2009-01-03","1","0"),("2009-01-09","14","0"),("2009-01-10","61","0")).toDF("day","bitcoin_total","dash_total")
df.withColumn("week_strt_day",date_sub(next_day('day,"sunday"),7)).groupBy("week_strt_day").agg(sum("bitcoin_total").cast("int").alias("bitcoin_total"),sum("dash_total").cast("int").alias("dash_total")).orderBy("week_strt_day").show()

Result:
+-------------+-------------+----------+
|week_strt_day|bitcoin_total|dash_total|
+-------------+-------------+----------+
| 2008-12-28| 1| 0|
| 2009-01-04| 75| 0|
+-------------+-------------+----------+

关于apache-spark - 将PySpark总数分组为几周,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57350353/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com