gpt4 book ai didi

mysql - 优化从同一个表中提取的多列的查询

转载 作者:可可西里 更新时间:2023-11-01 08:07:56 27 4
gpt4 key购买 nike

这是 another question here on SO 的后续.

我有这两个数据库表(省略了更多表):

acquisitions (acq)
id {PK}
id_cu {FK}
datetime
{ Unique Constraint: id_cu - datetime }

data
id {PK}
id_acq {FK acquisitions}
id_meas
id_elab
value

一切可能 iddatetime全部 索引。

当然,我会 不是 更改数据库结构我需要以这种方式提取数据:
  • 按日期时间分组的行
  • 每列对应data.value对于选定的 acq.id_cu - data.id_meas - data.id_elab组合。 (见帖子底部的注释)
  • 如果某些列的数据缺失但在日期时间中存在其他列的数据,则允许空单元格

  • 我当前的查询是这样构建的(请参阅 SO question ):
    SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (

    SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3
    FROM acq INNER JOIN data ON acq.id = data.id_acq
    WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1

    UNION

    SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3
    FROM acq INNER JOIN data ON acq.id = data.id_acq
    WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6

    UNION

    SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3
    FROM acq INNER JOIN data ON acq.id = data.id_acq
    WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8

    ) AS T
    WHERE datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
    GROUP BY datetime

    这里只检索 3 列,但正如我所说,列经常超过 50。

    它完美无缺,但我想知道它是否可以优化速度。

    这是 MySQL EXPLAIN EXTENDED对于上面的查询:
    +----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    +----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
    | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 82466 | 100.00 | Using where; Using temporary; Using filesort |
    | 2 | DERIVED | acquisitions | ref | PRIMARY,id_cu,ix_acquisitions_id_cu | id_cu | 4 | | 18011 | 100.00 | |
    | 2 | DERIVED | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
    | 3 | UNION | acquisitions | ref | PRIMARY,id_cu,ix_acquisitions_id_cu | ix_acquisitions_id_cu | 4 | | 20864 | 100.00 | |
    | 3 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
    | 4 | UNION | acquisitions | ref | PRIMARY,id_cu,ix_acquisitions_id_cu | id_cu | 4 | | 31848 | 100.00 | |
    | 4 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
    | NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | |
    +----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
    8 rows in set, 1 warning (8.24 sec)

    目前(编辑:今天检查)39 万次采集和 920 万个数据值(并且还在增长),提取一个包含 59 列的表需要大约 10 分钟。我知道先例软件需要长达 1 小时才能提取数据。

    感谢您耐心阅读到这里:)

    更新

    在丹尼斯回答后,我尝试了他的更改 1. 和 2.,这是新查询的结果:
    SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (

    SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3
    FROM acq INNER JOIN data ON acq.id = data.id_acq
    WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1
    AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"

    UNION ALL

    SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3
    FROM acq INNER JOIN data ON acq.id = data.id_acq
    WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6
    AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"

    UNION ALL

    SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3
    FROM acq INNER JOIN data ON acq.id = data.id_acq
    WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8
    AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"

    ) AS T GROUP BY datetime

    这里是新的 EXPLAIN EXTENDED :
    +----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    +----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
    | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 51997 | 100.00 | Using temporary; Using filesort |
    | 2 | DERIVED | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 14827 | 100.00 | Using where |
    | 2 | DERIVED | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
    | 3 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 18663 | 100.00 | Using where |
    | 3 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
    | 4 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 13260 | 100.00 | Using where |
    | 4 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
    | NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | |
    +----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
    8 rows in set, 1 warning (3.01 sec)

    毫无疑问,性能获得了不错的提升

    更新 (2)

    这是加点 3.
    EXPLAIN EXTENDED SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (

    SELECT acquisitions.datetime AS datetime, MAX(data.value) AS v1, NULL AS v2, NULL AS v3
    FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
    WHERE acquisitions.id_cu = 1 AND data.id_meas = 1 AND data.id_elab = 2
    AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
    GROUP BY datetime

    UNION ALL

    SELECT acquisitions.datetime AS datetime, NULL AS v1, MAX(data.value) AS v2, NULL AS v3
    FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
    WHERE acquisitions.id_cu = 4 AND data.id_meas = 1 AND data.id_elab = 2
    AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
    GROUP BY datetime

    UNION ALL

    SELECT acquisitions.datetime AS datetime, NULL AS v1, NULL AS v2, MAX(data.value) AS v3
    FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
    WHERE acquisitions.id_cu = 8 AND data.id_meas = 1 AND data.id_elab = 2
    AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
    GROUP BY datetime

    ) AS T GROUP BY datetime;

    这是 EXPLAIN EXTENDED 的结果
    +----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
    +----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
    | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 51997 | 100.00 | Using temporary; Using filesort |
    | 2 | DERIVED | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 14827 | 100.00 | Using where |
    | 2 | DERIVED | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
    | 3 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 18663 | 100.00 | Using where |
    | 3 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
    | 4 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 13260 | 100.00 | Using where |
    | 4 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
    | NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | |
    +----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
    8 rows in set, 1 warning (3.06 sec)

    稍微慢一点,这是否应该受益于大量coulmns?我会试试看...

    更新 (3)

    我试过有和没有 MAX(data.value)... GROUP BY datetime而且,在 60 列查询中,我得到了更好的结果 .结果因尝试而异,这是其中之一。
  • 原查询9m12.144s
  • 与丹尼斯的 1.2. 4m6.597s
  • 与丹尼斯的 1. , 2.3. 4m0.210s

  • 所需时间减少了约 57%。

    更新 (4)

    我尝试了 Andiry 解决方案,但它比 Denis 优化慢得多。

    在 3 个组合/列上测试:
  • 未优化:1m3s
  • Denis 的优化:1.7s
  • 安迪里的CASE : 9.3s

  • 我还测试了 12 个组合/列:
  • 未优化:未经测试
  • Denis 的优化:3.6s
  • 安迪里的CASE : 13.7s

  • 此外,Andiry 的解决方案还引入了收购日期,其中没有任何选定组合的数据,但存在其他组合的数据。

    Immagine 控制单元 1 每 30 分钟在 :00 和 :30 获取数据,而控制单元 2 在 :15 和 :45:我将使用 NULL 填充的空行将行数加倍。

    注:

    这完全是关于一个传感器系统:有几个 控制单元 (每个 id_cu 一个)与许多 传感器 每个。

    单个传感器由 id_cu / id_meas 标识夫妇和发送不同 详细说明对于每个度量,例如 MIN ( id_elab=1 )、MAX ( id_elab=2 )、AVERAGE ( id_elab=3 )、INSTANT ( id_elab=... ) 等,每个 id_elab 一个.

    用户可以自由选择他想要的详细说明,例如:
  • 结果列的控制单元 #1 的传感器 #3 的平均值 (3) 所以 id_cu=1 / id_meas=3 / id_elab=3
  • 结果列的控制单元 #1 的传感器 #5 的平均值 (3) 所以 id_cu=1 / id_meas=5 / id_elab=3
  • 用于另一列的控制单元 #4 的传感器 #2 的最小值 (1) 所以 id_cu=4 / id_meas=2 / id_elab=1
  • (输入任何有效的 id_cu, id_meas, id_elab 组合)
  • ...

  • 依此类推,多达数十种选择……

    这是部分 DDL(排除不相关的表):
    CREATE TABLE acquisitions (
    id INTEGER NOT NULL AUTO_INCREMENT,
    id_cu INTEGER NOT NULL,
    datetime DATETIME NOT NULL,
    PRIMARY KEY (id),
    UNIQUE (id_cu, datetime),
    FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
    )

    CREATE TABLE data (
    id INTEGER NOT NULL AUTO_INCREMENT,
    id_acq INTEGER NOT NULL,
    id_meas INTEGER NOT NULL,
    id_elab INTEGER NOT NULL,
    value FLOAT,
    PRIMARY KEY (id),
    FOREIGN KEY(id_acq) REFERENCES acquisitions (id) ON DELETE CASCADE
    )

    CREATE TABLE ctrl_units (
    id INTEGER NOT NULL,
    name VARCHAR(40) NOT NULL,
    PRIMARY KEY (id)
    )

    CREATE TABLE sensors (
    id_cu INTEGER NOT NULL,
    id_meas INTEGER NOT NULL,
    id_elab INTEGER NOT NULL,
    name VARCHAR(40) NOT NULL,
    `desc` VARCHAR(80),
    PRIMARY KEY (id_cu, id_meas),
    FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
    )

    最佳答案

    主要有以下三个问题:

  • 使用 union all,而不是 union。您正在分组和获取最小/最大值,因此引入排序步骤来删除重复行是没有意义的。
  • where 子句可以放在每个 union 子语句中:
    select ...
    from (
    select ... from ... where ...
    union all
    select ... from ... where ...
    union all
    ...
    )
    group by ...

    按照您编写的方式,它首先获取所有行,将它们全部附加,最后过滤您需要的行。在 union 子语句中注入(inject) where 子句将使其仅获取您需要的行,最后将它们全部追加。
  • 同样,预聚合聚合:
    select ..., max(foo) as foo
    from (
    select ..., max(foo) as foo from ... where ... group by ...
    union all
    select ..., max(foo) as foo from ... where ... group by ...
    union all
    ...
    )
    group by ...

    优化器将更好地利用现有索引,您最终只会附加几行,而不是数百万行。
  • 关于mysql - 优化从同一个表中提取的多列的查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5938181/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com