gpt4 book ai didi

sql - 带分组+时间戳列的最新数据的物化 View

转载 作者:行者123 更新时间:2023-12-05 04:42:24 25 4
gpt4 key购买 nike

我在 Bigquery 中对特征(或属性)进行建模。这是模型的示例

uid         string, link to owner id
uuid string, unique among all rows
trait_name string, name of the trait
trait_value string
added_at timestamp, when the trait was added

我正在尝试构建一个物化 View ,该 View 包含每个 uid 的每个特征的最新特征。我能够通过此查询获得结果:

WITH traits AS (
SELECT 'u1' uid, 'uu1' uuid, 't1' trait_name, 't1v1' trait_value, timestamp("2021-10-01 10:00:00") as added_at UNION ALL
SELECT 'u1' uid, 'uu2' uuid, 't1' trait_name, 't1v2' trait_value, timestamp("2021-10-02 10:00:00") as added_at UNION ALL
SELECT 'u1' uid, 'uu3' uuid, 't2' trait_name, 't2v1' trait_value, timestamp("2021-10-03 10:00:00") as added_at UNION ALL
SELECT 'u1' uid, 'uu4' uuid, 't2' trait_name, 't2v2' trait_value, timestamp("2021-10-04 10:00:00") as added_at UNION ALL
SELECT 'u2' uid, 'uu5' uuid, 't1' trait_name, 't1v1' trait_value, timestamp("2021-10-05 10:00:00") as added_at UNION ALL
SELECT 'u2' uid, 'uu6' uuid, 't1' trait_name, 't1v2' trait_value, timestamp("2021-10-06 10:00:00") as added_at UNION ALL
SELECT 'u2' uid, 'uu7' uuid, 't2' trait_name, 't2v1' trait_value, timestamp("2021-10-07 10:00:00") as added_at UNION ALL
SELECT 'u2' uid, 'uu8' uuid, 't2' trait_name, 't2v2' trait_value, timestamp("2021-10-08 10:00:00") as added_at
)
SELECT * FROM (
SELECT *,
MAX(added_at) OVER (PARTITION BY uid, trait_name) as latest_added_at FROM traits
) WHERE latest_added_at = added_at

# Row uid uuid trait_name trait_value added_at latest_added_at
# 1 u1 uu2 t1 t1v2 2021-10-02 10:00:00 UTC 2021-10-02 10:00:00 UTC
# 2 u1 uu4 t2 t2v2 2021-10-04 10:00:00 UTC 2021-10-04 10:00:00 UTC
# 3 u2 uu6 t1 t1v2 2021-10-06 10:00:00 UTC 2021-10-06 10:00:00 UTC
# 4 u2 uu8 t2 t2v2 2021-10-08 10:00:00 UTC 2021-10-08 10:00:00 UTC

但是我不能将它用于物化 View ,因为它们不支持它:

Materialized views do not support analytic functions or WITH OFFSET.

我也尝试过使用连接

SELECT * FROM traits t
JOIN (
SELECT
uid,
trait_name,
MAX(added_at) AS max_added_at
FROM traits GROUP BY uid, trait_name
) grouped
ON t.uid = grouped.uid
AND t.trait_name = grouped.trait_name
AND t.added_at = grouped.max_added_at

但是他们也不支持

Materialized views queries may not reference the same table more than once. Table traits was seen multiple times.

有没有办法把它做成物化 View ?

最佳答案

物化 View 有一些限制,可能无法实现(有关更多信息,请参阅 this 公共(public)文档)。

因此目前看来您正在尝试实现的目标并未得到完全支持,使用 ARRAY_AGG 似乎是最接近的可能,缺点之一是结果是数组类型,而不是值本身。

我设法通过此查询检索到预期结果:

CREATE MATERIALIZED VIEW test_traits.mv_sample_traits as
SELECT uid,
ARRAY_AGG(uuid IGNORE NULLS ORDER BY added_at DESC, uuid DESC, trait_value DESC LIMIT 1) as uuid,
trait_name,
ARRAY_AGG(trait_value IGNORE NULLS ORDER BY added_at DESC, uuid DESC, trait_value DESC LIMIT 1) as trait_value,
ARRAY_AGG(added_at IGNORE NULLS ORDER BY added_at DESC, uuid DESC, trait_value DESC LIMIT 1) as added_at,
MAX(added_at) AS latest_added_at,
FROM test_traits.traits
GROUP BY uid,trait_name


/*
SELECT * FROM test_traits.mv_sample_traits;
*/

sample

请注意,代码仅供引用,您需要运行测试来验证它是否适合您的用例,不能保证这会按预期工作,因此我不推荐将其用于生产。

关于sql - 带分组+时间戳列的最新数据的物化 View ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69857647/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com