gpt4 book ai didi

mysql - 如何查找上一条记录[n-per-group max(timestamp) < timestamp]?

转载 作者:行者123 更新时间:2023-11-29 12:52:59 26 4
gpt4 key购买 nike

我有一个包含时间序列传感器数据的大表。大型是指分布在被监控的各个 channel 中的从几千到 10M 的记录。对于某种传感器类型,我需要计算当前读数和上一个读数之间的时间间隔,即找到当前读数之前的最大时间戳。

我想到了显而易见的方法,每种方法都在 Core i5 上针对 40k 条目的 channel 进行了测量:

相关子查询

SELECT collect.*, prev.timestamp AS prev_timestamp
FROM data AS collect
LEFT JOIN data AS prev ON prev.channel_id = collect.channel_id AND prev.timestamp = (
SELECT MAX(timestamp)
FROM data
WHERE data.channel_id = collect.channel_id AND data.timestamp < collect.timestamp
)
WHERE collect.channel_id=14 AND collect.timestamp >= 0
ORDER BY collect.timestamp

时间(执行、获取)11秒、21秒

计划

+----+--------------------+---------+------+------------------------------+---------+---------+-------------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+------+------------------------------+---------+---------+-------------------------+-------+--------------------------+
| 1 | PRIMARY | collect | ref | ts_uniq,IDX_ADF3F36372F5A1AA | ts_uniq | 5 | const | 45820 | Using where |
| 1 | PRIMARY | prev | ref | ts_uniq,IDX_ADF3F36372F5A1AA | ts_uniq | 13 | const,func | 1 | Using index |
| 2 | DEPENDENT SUBQUERY | data | ref | ts_uniq,IDX_ADF3F36372F5A1AA | ts_uniq | 5 | nils.collect.channel_id | 2495 | Using where; Using index |
+----+--------------------+---------+------+------------------------------+---------+---------+-------------------------+-------+--------------------------+

反加入

SELECT d1.*, d2.timestamp AS prev_timestamp
FROM data d1
LEFT JOIN data d2 ON
d2.channel_id=14 AND
d2.timestamp < d1.timestamp
LEFT JOIN data d3 ON
d3.channel_id=14 AND
d3.timestamp < d1.timestamp AND
d3.timestamp > d2.timestamp
WHERE
d3.timestamp IS NULL AND
d1.channel_id=14
ORDER BY timestamp

时间 12秒,21秒

计划

+----+-------------+-------+------+------------------------------+---------+---------+-------+-------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------------+---------+---------+-------+-------+--------------------------------------+
| 1 | SIMPLE | d1 | ref | ts_uniq,IDX_ADF3F36372F5A1AA | ts_uniq | 5 | const | 45820 | Using where |
| 1 | SIMPLE | d2 | ref | ts_uniq,IDX_ADF3F36372F5A1AA | ts_uniq | 5 | const | 47194 | Using index |
| 1 | SIMPLE | d3 | ref | ts_uniq,IDX_ADF3F36372F5A1AA | ts_uniq | 5 | const | 47194 | Using where; Using index; Not exists |
+----+-------------+-------+------+------------------------------+---------+---------+-------+-------+--------------------------------------+

我想出了另一种模式,我称之为朴素计数

SELECT current.*, prev.timestamp AS prev_timestamp FROM
(
SELECT data.*, @r1 := @r1+1 AS rownum from data
CROSS JOIN (SELECT @r1 := 0) AS vars
WHERE channel_id=14
ORDER BY timestamp
) AS current
LEFT JOIN
(
SELECT data.*, @r2 := @r2+1 AS rownum from data
CROSS JOIN (SELECT @r2 := 0) AS vars
WHERE channel_id=14
ORDER BY timestamp
) AS prev
ON current.rownum = prev.rownum+1

时间 1.1秒(这实际上是最快的!)

计划

+----+-------------+------------+--------+------------------------------+---------+---------+-----+-------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+------------------------------+---------+---------+-----+-------+----------------+
| 1 | PRIMARY | <derived2> | ALL | | | | | 24475 | |
| 1 | PRIMARY | <derived4> | ALL | | | | | 24475 | |
| 4 | DERIVED | <derived5> | system | | | | | 1 | |
| 4 | DERIVED | data | ref | ts_uniq,IDX_ADF3F36372F5A1AA | ts_uniq | 5 | | 45820 | Using where |
| 5 | DERIVED | | | | | | | | No tables used |
| 2 | DERIVED | <derived3> | system | | | | | 1 | |
| 2 | DERIVED | data | ref | ts_uniq,IDX_ADF3F36372F5A1AA | ts_uniq | 5 | | 45820 | Using where |
| 3 | DERIVED | | | | | | | | No tables used |
+----+-------------+------------+--------+------------------------------+---------+---------+-----+-------+----------------+

由于查询可能在像 RasPi 这样的小型平台上运行,因此性能至关重要 - 几秒钟是最可接受的。

我的问题:最后一种方法对于每组最优秀的人来说是一个好的方法还是有更好的方法?相关子查询是否像经历的那样慢?

最佳答案

最后一种使用变量的方法是合理的。您也可以尝试:

SELECT collect.*,
(select max(timestamp)
from data
where data.channel_id = collect.channel_id AND data.timestamp < collect.timestamp
) AS prev_timestamp
FROM data AS collect
WHERE collect.channel_id = 14 AND collect.timestamp >= 0
ORDER BY collect.timestamp;

此外,创建索引:collect(channel_id, timestamp)。

编辑:

以下可能是最快的:

  select d.*,
if(@channel_id = channel_id, @prev_timestamp, NULL) as prev_timestamp,
@channel_id := channel_id, @prev_timestamp = timestamp
from data d cross join
(select @channel_id := 0, @prev_timestamp := 0) vars
where collect.channel_id = 14 AND collect.timestamp >= 0
order by channel_id, timestamp;

关于mysql - 如何查找上一条记录[n-per-group max(timestamp) < timestamp]?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24457442/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com