gpt4 book ai didi

mysql - 优化计算一天中每一分钟平均值的 SQL 查询

转载 作者:行者123 更新时间:2023-11-29 14:08:14 25 4
gpt4 key购买 nike

我编写了一个 SQL 例程,用于根据 5 天的数据(天数是例程的参数)计算一天中每一分钟的平均值,并将结果插入到另一个表中。太长了,我想知道是否有任何方法可以优化它。

我需要用来计算平均值的值都在同一个表 SiteReading 中,因此为了获取同一分钟但来自不同日期的 5 个值,我加入了这些天的表子集,以便小时和分钟匹配然后这些值最终会出现在同一行。然后,我在每行上添加 5 个值,并从中创建一个新表,并将其插入到存储这些平均值的基线表中。

这是例程:

CREATE PROCEDURE 'calc_baseline` (IN `input_site_id` int, IN `day1` varchar(12), IN `day2` varchar(12), IN `day3` varchar(12), IN `day4` varchar(12), IN `day5` varchar(12))
BEGIN

insert into Baseline
SELECT
site_id,
contract_id,
temp_time as timestamp,
(sr1value + sr2value + sr3value + sr4value + sr5value) / 5 as value,
programme
FROM
(SELECT
distinct concat(cast(hour(temp_time) as char), ':', cast(minute(temp_time) as char)) as hourminute,
SR.site_id as site_id,
value as sr1value,
temp_time,
S.contract_id as contract_id,
programme
FROM
SiteReading SR
join Site S ON SR.site_id = S.site_id
join Contract C ON S.contract_id = C.contract_id
where
temp_time like 'day1%'
and SR.site_id = input_site_id) sr1
join
(SELECT
concat(cast(hour(temp_time) as char), ':', cast(minute(temp_time) as char)) as hourminute,
value as sr2value
FROM
SiteReading
where
temp_time like 'day2%'
and site_id = input_site_id) sr2 ON sr1.hourminute = sr2.hourminute
join
(SELECT
concat(cast(hour(temp_time) as char), ':', cast(minute(temp_time) as char)) as hourminute,
value as sr3value
FROM
SiteReading
where
temp_time like 'day3%'
and site_id = input_site_id) sr3 ON sr1.hourminute = sr3.hourminute
join
(SELECT
concat(cast(hour(temp_time) as char), ':', cast(minute(temp_time) as char)) as hourminute,
value as sr4value
FROM
SiteReading
where
temp_time like 'day4%'
and site_id = input_site_id) sr4 ON sr1.hourminute = sr4.hourminute
join
(SELECT
concat(cast(hour(temp_time) as char), ':', cast(minute(temp_time) as char)) as hourminute,
value as sr5value
FROM
SiteReading
where
temp_time like 'day5%'
and site_id = input_site_id) sr5 ON sr1.hourminute = sr5.hourminute
limit 1440;

END//

DELIMITER ;
<小时/>

它正在读取和写入的相关表是:

- 网站阅读:

CREATE TABLE `SiteReading` (
`site_id` int(11) NOT NULL,
`contract_id` int(11) DEFAULT NULL,
`temp_time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`value` int(11) NOT NULL,
PRIMARY KEY (`site_id`,`temp_time`),
KEY `site_id` (`site_id`),
KEY `contract_id` (`contract_id`),
CONSTRAINT `SiteReading_ibfk_1` FOREIGN KEY (`site_id`) REFERENCES `Site` (`site_id`),
CONSTRAINT `SiteReading_ibfk_3` FOREIGN KEY (`contract_id`) REFERENCES `Contract` (`contract_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8$$

- 基线:

CREATE TABLE `Baseline` (
`site_id` int(11) NOT NULL,
`contract_id` int(11) NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`value` int(11) NOT NULL,
`programme` int(11) NOT NULL,
PRIMARY KEY (`site_id`,`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8$$
  1. 因为我需要获取一些附加值(site_id、contract_id、programme)来存储在基线中,这些值对于每一行都相同,所以我想知道也许我应该以其他方式执行插入语句?问题是基线表的所有列都不能为空。

  2. 也许有人对此过程有任何其他评论 - 我是否需要为此例程定义一些其他参数,例如 ON DUPLICATE KEY UPDATE 或其他与例程相关的内容?

谢谢。

最佳答案

SELECT 
t1.site_id,
t1.contract_id,
t1.temp_time,
AVG(t2.value)
FROM
SiteReading AS t1
LEFT JOIN
SiteReading AS t2
ON
t1.site_id = t2.site_id
AND t2.datetime BETWEEN startdate AND enddate
AND HOUR(t1.temp_time) = HOUR(t2.temp_time)
AND MINUTE(t1.temp_time) = MINUTE(t2.temp_time)
WHERE
t1.temp_time BETWEEN startdate AND enddate
GROUP BY
t1.site_id,
t1.contract_id,
t1.temp_time

根本没有测试过,但是这样的东西可能会更好地为你服务。我所做的优化:

  1. 在匹配的时间间隔上使用单个自连接。
  2. 使用 group by 进行平均聚合
  3. 将第一个表格限制为 5 天期间(开始日期、结束日期)之间的子集
  4. 未通过网站或契约(Contract)加入。您在基线中拥有这些表的外键,因此无需从这些表中提取额外的数据(我假设程序来自其中之一)。

1.Because I need to get some additional values (site_id, contract_id, programme) to store in the Baseline that are the same for each row I was wondering that maybe I should do the insert statement in some other way? the thing is that all of the Baseline table's columns can not be null.

参见#4

2.Maybe someone has any other comments about this procedure - do I need to define some other parameters for this routine like ON DUPLICATE KEY UPDATE or some other routine related things?

不确定我是否完全理解您的要求。您是否在较长时间内收集多个 5 天基线?如果是这样,我不明白为什么您需要更新任何内容。如果某些 temp_time 重叠(即,在 5 天的时间内运行过程的频率高于每 5 天一次),那么您可以保留唯一的 ID 或时间戳作为基线主键的一部分,以标识运行过程的时间,以防止重复键在 temp_time 上。

编辑

我现在才发现你的日子可能不是连续的。在这种情况下,更改这些行:

AND t2.datetime BETWEEN startdate AND enddate

t1.temp_time BETWEEN startdate AND enddate

至:

AND DATE(t2.datetime) IN (day1, day2, day3, day4, day5)

DATE(t1.temp_time) IN (day1, day2, day3, day4, day5)

但是,这会带来一个问题,因为您现在必须在 WHERE 子句和 ON 条件下对站点读取进行全表扫描。为了避免这种情况,您可以考虑在存储数据集之前规范化数据集的时间间隔。例如,如果每天读取 24*60 个读数,则每个 temp_time 间隔可以用 1 到 1440 的 int 表示,每天可以用 1 到 365(366 闰年)的 int 表示。然后在 where 和 join 子句中使用这些值。

关于mysql - 优化计算一天中每一分钟平均值的 SQL 查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13994140/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com