gpt4 book ai didi

sql - 在事件发生之前将事件表中的数据与变更日志表中的最新数据连接起来

转载 作者:行者123 更新时间:2023-12-04 08:57:41 24 4
gpt4 key购买 nike

我在找 JOIN将用户更改日志样式表中的数据转换为具有匹配 ID 的事件表
表格如下:
项目_事件
模式

timestamp       TIMESTAMP
event_id STRING
user_id STRING
data STRING
示例数据
| timestamp                   | event_id  | user_id | data             |
|-----------------------------|-----------|---------|------------------|
| 2020-08-22 17:01:18.807 UTC | hHZuTE8Y= | ABC123 | {"some":"json" } |
| 2020-08-20 16:57:28.022 UTC | tF5Gky8Q= | ZXY432 | {"foo":"item" } |
| 2020-08-15 16:44:25.607 UTC | 1dOU8pOo= | ABC123 | {"bar":"val" } |
users_changelog
模式
timestamp       TIMESTAMP
event_id STRING
operation STRING
user_id STRING
data STRING
示例数据
| timestamp                   | event_id  | operation | user_id | data                |
|-----------------------------|-----------|-----------|---------|---------------------|
| 2020-08-30 12:50:59.036 UTC | mGdNKy+o= | DELETE | ABC123 | {"name":"removed" } |
| 2020-08-20 16:50:59.036 UTC | mGdNKy+o= | UPDATE | ABC123 | {"name":"final" } |
| 2020-08-05 20:45:36.936 UTC | mIICo9LY= | UPDATE | ZXY432 | {"name":"asdf" } |
| 2020-08-05 20:45:21.023 UTC | nEDKyCks= | UPDATE | ABC123 | {"name":"other" } |
| 2020-08-03 12:40:49.036 UTC | GxnbUqQ0= | CREATE | ABC123 | {"name":"initial" } |
| 1970-01-01 00:00:00 UTC | 1y+6fVWo= | IMPORT | ZXY432 | {"name":"test" } |
注意:操作可以是“创建”、“更新”、“删除”或“导入”。由于用户可以多次更新,因此可以有多个具有相同 user_id 的行
目标是显示与用户表中匹配 ID 的最近操作相关联的 event_id 和数据列。使用示例数据,预期结果将是:
| event_id  | event_data       | user_id | user_data         |
|-----------|------------------|---------|-------------------|
| hHZuTE8Y= | {"some":"json" } | ABC123 | {"name":"final" } |
| tF5Gky8Q= | {"foo":"item" } | ZXY432 | {"name":"asdf" } |
| 1dOU8pOo= | {"bar":"val" } | ABC123 | {"name":"other" } |
我尝试了以下操作,但它会产生重复的行(更改日志表中具有匹配 ID 的每一行一个):
SELECT
events.event_id as event_id,
events.data as event_data,
users.user_id as user_id,
users.data as user_data
FROM my_project.my_dataset.project_events as events
LEFT JOIN my_project.my_dataset.users_changelog as users
ON events.user_id = users.user_id AND users.timestamp <= events.timestamp

最佳答案

下面是 BigQuery 标准 SQL

#standardSQL
SELECT event_id, data AS event_data, user_id,
( SELECT data
FROM UNNEST(arr) rec
WHERE rec.timestamp < t.timestamp
ORDER BY rec.timestamp DESC
LIMIT 1
) AS user_data
FROM (
SELECT
ANY_VALUE(events).*,
ARRAY_AGG(STRUCT(users.data, users.timestamp)) arr
FROM `my_project.my_dataset.project_events` AS events
LEFT JOIN `my_project.my_dataset.users_changelog` AS users
ON events.user_id = users.user_id
GROUP BY FORMAT('%t', events)
) t
如果适用于您问题中的样本数据 - 输出是
Row event_id        event_data          user_id     user_data    
1 hHZuTE8Y= {"some":"json" } ABC123 {"name":"final" }
2 tF5Gky8Q= {"foo":"item" } ZXY432 {"name":"asdf" }
3 1dOU8pOo= {"bar":"val" } ABC123 {"name":"other" }

关于sql - 在事件发生之前将事件表中的数据与变更日志表中的最新数据连接起来,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63731538/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com