gpt4 book ai didi

sql - 雪花分析查询设计

转载 作者:行者123 更新时间:2023-12-04 07:47:46 25 4
gpt4 key购买 nike

我有一个棘手的查询设计要求,我尝试了不同类型/不同的分析函数组合,以从以下数据集中获得我的结果。我的另一个计划是编写存储过程,但是我想在改变方向之前联系专家组。
输入数据集:
enter image description here
带有组列的必需输出数据集:当 session ID 中的 session ID 发生更改并且如果我再次取回相同的 session ID 时,我必须使用不同的组。我尝试使用 LEAD/LAG 组合,但是无法获得以下所需的输出,一种或其他情况正在中断。
enter image description here
谢谢 !

最佳答案

SQL 语言的表达能力足以为复杂的需求找到声明式解决方案。
Snowflake 最近实现了 SQL 2016 标准条款: MATCH_RECOGNIZE ,旨在以非常直接的方式解决此类情况。

Identifying Sequences of Rows That Match a Pattern

In some cases, you might need to identify sequences of table rows that match a pattern. For example, you might need to:

  • Determine which users followed a specific sequence of pages and actions on your website before opening a support ticket or making a purchase.

  • Find the stocks with prices that followed a V-shaped or W-shaped recovery over a period of time.

  • Look for patterns in sensor data that might indicate an upcoming system failure.


数据准备:
CREATE OR REPLACE TABLE t
AS
SELECT 101 SESS_ID, 1 POL_ID, '2021-04-17 09:30:00'::DATE AS Trans_dt, 1 AS VERSION_ID
UNION ALL SELECT 101 SESS_ID, 1 POL_ID, '2021-04-17 09:35:00'::DATE AS Trans_dt, 2
UNION ALL SELECT 102 SESS_ID, 1 POL_ID, '2021-04-17 09:37:00'::DATE AS Trans_dt, 3
UNION ALL SELECT 102 SESS_ID, 1 POL_ID, '2021-04-17 09:38:00'::DATE AS Trans_dt, 4
UNION ALL SELECT 101 SESS_ID, 1 POL_ID, '2021-04-17 09:39:00'::DATE AS Trans_dt, 5
UNION ALL SELECT 101 SESS_ID, 1 POL_ID, '2021-04-17 09:40:00'::DATE AS Trans_dt, 6;
询问:
SELECT *
FROM t
MATCH_RECOGNIZE (
PARTITION BY POL_ID
ORDER BY VERSION_ID
MEASURES MATCH_NUMBER() AS group_id
--,CLASSIFIER() as cks
ALL ROWS PER MATCH
PATTERN (a+b*)
DEFINE a as sess_id = FIRST_VALUE(sess_id)
,b AS sess_id != FIRST_VALUE(sess_id)
) mr
ORDER BY POL_ID, VERSION_ID;
db<>fiddle demo(Oracle)
输出:
SESS_ID POL_ID  TRANS_DT    VERSION_ID  GROUP_ID
101 1 2021-04-17 1 1
101 1 2021-04-17 2 1
102 1 2021-04-17 3 1
102 1 2021-04-17 4 1
101 1 2021-04-17 5 2
101 1 2021-04-17 6 2
这个怎么运作:
  • 定义模式:(a+b*)这是 Perl 风格的正则表达式,a(一个或多个)b(零个或多个)
  • 定义模式组件 a(sess_id 与组的第一个元素相同),b(sess_id 与组的第一个元素不同)
  • 定义度量 MATCH_NUMBER() -“返回匹配的序列号”
  • 每个 POL_ID 执行此操作并使用 VERSION_ID作为排序列
  • 关于sql - 雪花分析查询设计,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67138778/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com