gpt4 book ai didi

performance - 在 BigQuery 中有效地获取每个日期每个 ID 过去 6 个月内所有以前日期的数组

转载 作者:行者123 更新时间:2023-12-05 05:38:12 25 4
gpt4 key购买 nike

我有一个非常大的表“DATES_EVENTS”(20 T),如下所示:

ID      DATE   
1 '2022-04-01'
1 '2022-03-02'
1 '2022-03-01'
2 '2022-05-01'
3 '2021-12-01'
3 '2021-11-11'
3 '2020-11-11'
3 '2020-10-01'

我希望每行获取最多 6 个月的所有过去日期(每个用户)。

我想要的 table :

ID      DATE                   DATE_list
1 '2022-04-01' ['2022-04-01','2022-03-02','2022-03-01']
1 '2022-03-02' ['2022-03-02','2022-03-01']
1 '2022-03-01' ['2022-03-01']
2 '2022-05-01' ['2022-05-01']
3 '2021-12-01' ['2021-12-01','2021-11-11']
3 '2021-11-11' ['2021-11-11']
3 '2020-11-11' ['2020-11-11','2020-10-01']
3 '2020-10-01' ['2020-10-01']

我有一个不限日期的解决方案:

SELECT 
ID, DATE, ARRAY_AGG(DATE) OVER (PARTITION BY ID ORDER BY DATE) as DATE_list
FROM
DATES_EVENTS

但在长达 6 个月的有限时间内,我没有有效的解决方案:

SELECT
distinct A.ID, A.DATE, ARRAY_AGG(B.DATE) OVER (PARTITION BY B.ID ORDER BY B.DATE) as DATE_list
FROM
DATES_EVENTS A
INNER JOIN
DATES_EVENTS B
ON
A.ID=B.ID
AND B.DATE BETWEEN DATE_SUB(A.DATE, INTERVAL 180 DAY) AND A.DATE

** ruffly a solution

有人知道做我需要做的事情的好方法吗?

最佳答案

考虑以下方法

select id, date, array(
select day
from t.date_list day
where day <= date
order by day desc
) as date_list
from (
select *, array_agg(date) over win as date_list
from dates_events
window win as (
partition by id
order by extract(year from date) * 12 + extract(month from date)
range between 5 preceding and current row
)
) t

如果应用于您问题中的示例数据 - 输出为

enter image description here

如果(正如我在您的问题中注意到的那样)180 天适合您代替 6 个月 - 您可以使用以下更简单的版本

select *, array_agg(date) over win as date_list
from dates_events
window win as (
partition by id
order by unix_date(date)
range between current row and 179 following
)

关于performance - 在 BigQuery 中有效地获取每个日期每个 ID 过去 6 个月内所有以前日期的数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72975366/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com