gpt4 book ai didi

SQL Server - 多天按分钟聚合数据

转载 作者:行者123 更新时间:2023-12-04 01:15:28 28 4
gpt4 key购买 nike

上下文

我使用的是 Microsoft SQL Server 2016。

有一个数据库表“Raw_data”,其中包含机器的状态及其启动时间。有几台机器,每台机器每分钟多次将其状态写入数据库。

为了减少数据量,我尝试将数据聚合成 1 分钟的数据 block 以保存以供进一步分析。由于容量限制,我想每隔几分钟执行一次此转换逻辑(例如计划的 SQL Server 代理作业),删除原始数据并只保留聚合数据。

为了简化示例,我们假设“Raw_data”看起来像这样:

╔════╦════════════╦════════╦═════════════════════╗
║ id ║ fk_machine ║ status ║ created_at ║
╠════╬════════════╬════════╬═════════════════════╣
║ 1 ║ 2222 ║ 0 ║ 2020-08-19 22:15:00 ║
║ 2 ║ 2222 ║ 3 ║ 2020-08-19 22:15:30 ║
║ 3 ║ 2222 ║ 5 ║ 2020-08-19 23:07:00 ║
║ 4 ║ 2222 ║ 1 ║ 2020-08-20 00:20:00 ║
║ 5 ║ 2222 ║ 0 ║ 2020-08-20 00:45:00 ║
║ 6 ║ 2222 ║ 5 ║ 2020-08-20 02:20:00 ║
╚════╩════════════╩════════╩═════════════════════╝

还有数据库表“Dim_date”和“Dim_time”,看起来像这样:

╔══════════╦══════════════╗
║ datekey ║ date_iso8601 ║
╠══════════╬══════════════╣
║ 20200101 ║ 2020-01-01 ║
║ 20200102 ║ 2020-01-02 ║
║ ... ║ ... ║
║ 20351231 ║ 2035-12-31 ║
╚══════════╩══════════════╝

╔═════════╦══════════╦═════════════════╗
║ timekey ║ time_iso ║ min_lower_bound ║
╠═════════╬══════════╬═════════════════╣
║ 1 ║ 00:00:01 ║ 00:00:00 ║
║ 2 ║ 00:00:02 ║ 00:00:00 ║
║ ... ║ ... ║ ... ║
║ 80345 ║ 08:03:45 ║ 08:03:00 ║
║ ... ║ ... ║ ... ║
║ 134504 ║ 13:45:04 ║ 13:45:00 ║
║ 134505 ║ 14:45:05 ║ 13:45:00 ║
║ ... ║ ... ║ ... ║
║ 235959 ║ 23:59:59 ║ 23:59:59 ║
╚═════════╩══════════╩═════════════════╝

结果应该是这样的:

╔══════════════╦═════════════════╦════════════╦════════╦═══════════════╗
║ date_iso8601 ║ min_lower_bound ║ fk_machine ║ status ║ total_seconds ║
╠══════════════╬═════════════════╬════════════╬════════╬═══════════════╣
║ 2020-08-19 ║ 22:15:00 ║ 2222 ║ 0 ║ 30 ║
║ 2020-08-19 ║ 20:15:00 ║ 2222 ║ 3 ║ 30 ║
║ 2020-08-19 ║ 20:16:00 ║ 2222 ║ 3 ║ 60 ║
║ 2020-08-19 ║ 20:17:00 ║ 2222 ║ 3 ║ 60 ║
║ ... ║ ... ║ ... ║ ... ║ ... ║
║ 2020-08-19 ║ 23:06:00 ║ 2222 ║ 3 ║ 60 ║
║ 2020-08-19 ║ 23:07:00 ║ 2222 ║ 5 ║ 60 ║
║ 2020-08-19 ║ 23:08:00 ║ 2222 ║ 5 ║ 60 ║
║ ... ║ ... ║ ... ║ ... ║ ... ║
║ 2020-08-20 ║ 00:19:00 ║ 2222 ║ 5 ║ 60 ║
║ 2020-08-20 ║ 00:20:00 ║ 2222 ║ 1 ║ 60 ║
║ 2020-08-20 ║ 00:21:00 ║ 2222 ║ 1 ║ 60 ║
║ ... ║ ... ║ ... ║ ... ║ ... ║
║ 2020-08-20 ║ 00:44:00 ║ 2222 ║ 1 ║ 60 ║
║ 2020-08-20 ║ 00:45:00 ║ 2222 ║ 0 ║ 60 ║
╚══════════════╩═════════════════╩════════════╩════════╩═══════════════╝

尝试

为了计算每分钟每种状态的持续时间,我使用了 CTELEAD从数据库表中的下一个状态中获取开始日期和时间,然后与维度表连接并聚合结果。

WITH CTE_MACHINE_STATES(START_DATEKEY, 
START_TIMEKEY,
FK_MACHINE,
END_DATEKEY,
END_TIMEKEY)
AS (SELECT CAST(CONVERT(CHAR(8), CREATED_AT, 112) AS INT), -- ISO: yyyymmdd
CONVERT(INT, REPLACE(CONVERT(CHAR(8), READING_TIME, 108), ':', '')),
FK_MACHINE,
STATUS,
CAST(CONVERT(CHAR(8), LEAD(CREATED_AT, 1) OVER(PARTITION BY FK_MACHINE
ORDER BY CREATED_AT), 112) AS INT),
CONVERT(INT, REPLACE(CONVERT(CHAR(8), LEAD(CREATED_AT, 1) OVER(PARTITION BY FK_MACHINE
ORDER BY CREATED_AT), 108), ':', ''))
FROM RAW_DATA)
SELECT DATE_ISO8601,
MIN_LOWER_BOUND,
FK_MACHINE,
STATUS,
SUM(1) AS TOTAL_SECONDS -- Duration
FROM CTE_MACHINE_STATES
CROSS JOIN DIM_DATE
CROSS JOIN DIM_TIME
WHERE TIMEKEY >= START_TIMEKEY AND
TIMEKEY < END_TIMEKEY AND
END_TIMEKEY IS NOT NULL AND -- last entry per machine and status
DATEKEY BETWEEN START_DATEKEY AND END_DATEKEY
GROUP BY FK_MACHINE,
STATUS,
DATE_ISO8610,
MIN_LOWER_BOUND
ORDER BY DATE_ISO8610,
MIN_LOWER_BOUND;

问题

如果状态持续到午夜之后,则无法正确汇总。例如,“Raw_data”中 id = 3 的状态从 23:07 开始,到第二天 00:20 结束。此处,timekey 大于 end_timekey,因此状态被过滤器从结果表中排除 TIMEKEY < END_TIMEKEY .我还没有想出如何更改连接条件以包含这种持久状态的解决方案,但得到了预期的结果。

PS:我已经写过,通常状态更新每隔几秒发生一次。因此,问题只发生在边缘情况下,例如如果机器关闭。


解决方案

不幸的是,我没有收到关于如何使用日期和时间维度表获得预期结果的答案。但是 dnoeth 使用递归 CTE 的方法很好,所以我采用了它:

WITH cte_outer AS (
SELECT fk_machine,
status,
created_at,
DATEADD(minute, DATEDIFF(minute, '2000', created_at), '2000') AS min_lower_bound, --truncates seconds from start time
LEAD(created_at) OVER(PARTITION BY fk_machine ORDER BY created_at) AS end_time
FROM raw_data
),
cte_recursive AS (
SELECT fk_machine,
status,
min_lower_bound,
end_time,
CASE
WHEN end_time > DATEADD(minute, 1, min_lower_bound)
THEN DATEDIFF(s, created_at, DATEADD(minute, 1, min_lower_bound))
ELSE DATEDIFF(s, created_at, end_time)
END AS total_seconds
FROM cte_outer

UNION ALL

SELECT fk_machine,
status,
DATEADD(minute, 1, min_lower_bound), -- next time segment (minute)
end_time,
CASE
WHEN end_time >= DATEADD(minute, 2, min_lower_bound)
THEN 60
ELSE DATEDIFF(s, DATEADD(minute, 1, min_lower_bound), end_time)
END
FROM cte_recursive
WHERE end_time > DATEADD(minute, 1, min_lower_bound)
)
SELECT min_lower_bound,
fk_machine,
status,
total_seconds
FROM cte_recursive
ORDER BY fk_machine,
min_lower_bound

最佳答案

这是递归 CTE 的用例,每次递归将 created_at 增加一分钟:

with cte as 
(
select fk_machine
,status
,start_minute
,end_time
,case
when end_time > dateadd(minute, 1,start_minute)
then datediff(s, created_at, dateadd(minute, 1,start_minute))
else datediff(s, created_at, end_time )
end as seconds
from
(
select fk_machine
,status
,created_at
,dateadd(minute, datediff(minute, 0, created_at), 0) as start_minute
,lead(created_at)
over (PARTITION BY fk_machine
order by created_at) as end_time
from tab
) as dt

union all

select fk_machine
,status
,dateadd(minute, 1,start_minute)
,end_time
,case
when end_time >= dateadd(minute, 2,start_minute)
then 60
else datediff(s, dateadd(minute, 1,start_minute), end_time)
end
from cte
where end_time > dateadd(minute, 1,start_minute)
)
select * from cte
order by 1,3,4;

参见 fiddle

关于SQL Server - 多天按分钟聚合数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63510962/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com