gpt4 book ai didi

snowflake-cloud-data-platform - 基于另一个表中范围的基于窗口的平均值

转载 作者:行者123 更新时间:2023-12-05 01:53:50 25 4
gpt4 key购买 nike

我有一张 table ,上面有这样的位置:

create or replace table data (
pos int not null,
val float not null,
constraint data_pk primary key (pos)
);

并且,像这样的范围类型表:

create or replace table ranges (
label varchar(32) not null,
left int not null,
right int not null,
constraint ranges_pk primary key (label)
);

范围如

        ('alpha', 11, 13),
('bravo', 11, 14),
('charlie', 11, 15),
('echo', 12, 18),
('delta', 12, 19),
('foxtrot', 13, 20)

对于每个标签,我需要在“数据”表中查找每个可能的 3 个子范围,取这 3 个子范围的平均值,然后对它们进行平均...

我想不出一个好的方式来描述我所追求的东西,所以我想我会展示我对“查理”的期望:

charlie 在选择中的结果应该是:

    ('charlie', 40.111), -- avg(avg(data[pos=11], data[pos=12], data[pos=13]), avg(data[pos=12], data[pos=13], data[pos=14]), avg(data[pos=13], data[pos=14], data[pos=15]))
-- -> avg(avg(31, 37, 41), avg(37, 41, 43), avg(41, 43, 47))
-- -> avg(36.333, 40.333, 43.667) -> 40.111

(对于类似的数据)

insert into data (pos, val) values
(1, 2), (2, 3), (3, 5), (4, 7), (5, 11), (6, 13), (7, 17), (8, 19),
(9, 23), (10, 29), (11, 31), (12, 37), (13, 41), (14, 43), (15, 47), (16, 53),
(17, 59), (18, 61), (19, 67), (20, 71), (21, 73), (22, 79), (23, 83), (24, 89),
(25, 97), (26, 101), (27, 103), (28, 107), (29, 109), (30, 113), (31, 127), (32, 131),
(33, 137), (34, 139), (35, 149), (36, 151), (37, 157), (38, 163), (39, 167), (40, 173),
(41, 179), (42, 181), (43, 191), (44, 193), (45, 197), (46, 199), (47, 211), (48, 223),
(49, 227), (50, 229), (51, 233), (52, 239), (53, 241), (54, 251);

有没有办法在 Snowflake SQL 中执行此操作?或者我必须求助于 python 来做到这一点?如果有帮助,我做了一个 gist有更多数据..

谢谢!

最佳答案

Is there a way to do this within Snowflake SQL? Or must I resort to python to do this?

SQL 语言的表现力足以处理这种情况。

这里的关键点是使用窗口大小为 3 的窗口平均值,然后使用平均移动平均值:

WITH cte AS (
SELECT r.label, r.left, r.right, d.val,
AVG(d.val) OVER(PARTITION BY r.label ORDER BY d.pos ROWS
BETWEEN 2 PRECEDING AND CURRENT ROW) AS r
FROM ranges r
JOIN data d
ON d.pos BETWEEN r.left AND r.right
QUALIFY ROW_NUMBER() OVER(PARTITION BY r.label ORDER BY d.pos) > 2
)
SELECT label, AVG(r) AS output
FROM cte
GROUP BY label
ORDER BY label;

输出:

enter image description here


说明的中间步骤:

WITH cte AS (
SELECT r.label, r.left, r.right, d.val,
AVG(d.val) OVER(PARTITION BY r.label ORDER BY d.pos
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS r
FROM ranges r
JOIN data d
ON d.pos BETWEEN r.left AND r.right
QUALIFY ROW_NUMBER() OVER(PARTITION BY r.label ORDER BY d.pos) > 2
)
SELECT *
FROM cte
ORDER BY label, r;

输出:

enter image description here

关于snowflake-cloud-data-platform - 基于另一个表中范围的基于窗口的平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71005155/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com