gpt4 book ai didi

google-bigquery - BigQuery 中的自联接问题

转载 作者:行者123 更新时间:2023-12-02 10:44:20 26 4
gpt4 key购买 nike

我正在尝试对当前行周围特定日期范围的值进行求和。由于 Bigquery 不支持 Windows 函数中的日期范围,因此我使用自连接,如下所示:

with test_data as (
select 1 val1, 7 val2, 'ord001' id, timestamp('2019-01-01 04:00:00') dt_order
union all
select 2 val1, 14 val2, 'ord002' id, timestamp('2019-01-02 05:00:00') dt_order
union all
select 3 val1, 21 val2, 'ord003' id, timestamp('2019-01-03 06:00:00') dt_order
)

,revenue_coeff as (
select
td.id,
td.val1 *
(select sum(td1.val2) / sum(td1.val1)
from test_data td1
where td1.dt_order >= timestamp_sub(td.dt_order, interval 24 hour) and
td1.dt_order < timestamp_add(td.dt_order, interval 6 minute)
)
from test_data td
)

select * from revenue_coeff

这个玩具查询工作得很好。但是,当我尝试使用真正的 BigQuery 表时,我收到“如果没有连接两侧字段相等的条件,则无法使用 LEFT OUTER JOIN”消息。如何在 BQ 中实现这样的查询?提前致谢!

最佳答案

以下适用于 BigQuery 标准 SQL

我将首先在帖子末尾回答您的问题 - 但随后会在帖子顶部解决您的陈述。所以...

I get a 'LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join' message. How can I implement such a query in BQ?

#standardSQL
WITH `project.dataset.test_data` AS (
SELECT 1 val1, 7 val2, 'ord001' id, TIMESTAMP('2019-01-01 04:00:00') dt_order UNION ALL
SELECT 1 val1, 14 val2, 'ord002' id, TIMESTAMP('2019-01-02 05:00:00') dt_order UNION ALL
SELECT 1 val1, 21 val2, 'ord003' id, TIMESTAMP('2019-01-03 06:00:00') dt_order
), revenue_coeff AS (
SELECT
td1.id,
td1.val1 * SUM(td2.val2) / SUM(td2.val1)
FROM `project.dataset.test_data` td1
CROSS JOIN `project.dataset.test_data` td2
WHERE td2.dt_order >= TIMESTAMP_SUB(td1.dt_order, INTERVAL 24 HOUR)
AND td2.dt_order < TIMESTAMP_ADD(td1.dt_order, INTERVAL 6 MINUTE)
GROUP BY td1.id, td1.val1
)
SELECT * FROM revenue_coeff

如您所见,您可以使用 CROSS JOIN 代替 LEFT JOIN,并将 ON 子句移至 WHERE 子句

Since Bigquery do not suppport dated ranges in windows functions ...

实际上,它确实支持 - 请参阅示例

#standardSQL
WITH `project.dataset.test_data` AS (
SELECT 1 val1, 7 val2, 'ord001' id, TIMESTAMP('2019-01-01 04:00:00') dt_order UNION ALL
SELECT 1 val1, 14 val2, 'ord002' id, TIMESTAMP('2019-01-02 05:00:00') dt_order UNION ALL
SELECT 1 val1, 21 val2, 'ord003' id, TIMESTAMP('2019-01-03 06:00:00') dt_order
), revenue_coeff AS (
SELECT id, val1 * SUM(val2) OVER(win) / SUM(val1) OVER(win)
FROM `project.dataset.test_data` td1
WINDOW win AS (ORDER BY UNIX_SECONDS(dt_order) RANGE BETWEEN 86400 PRECEDING AND 359 FOLLOWING )
)
SELECT * FROM revenue_coeff

如您所见 - 技巧在于使用 UNIX_SECONDS 函数将时间戳数据类型“转换”为 int

显然 - 我建议您选择第二个选项

关于google-bigquery - BigQuery 中的自联接问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59216896/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com