gpt4 book ai didi

mysql - 了解类似 MySQL 和 PostgreSQL 数据库中的 EXPLAIN 语句

转载 作者:行者123 更新时间:2023-11-29 13:56:05 25 4
gpt4 key购买 nike

我目前正在开发支持多个数据库的 Web 服务。我正在尝试优化表并修复缺失的索引。以下是 MySQL 查询:

SELECT 'UTC' AS timezone, pak.id AS package_id, rel.unique_id AS relay, sns.unique_id AS sensor, pak.rtime AS time,
sns.units AS sensor_units, typ.name AS sensor_type, dat.data AS sensor_data,
loc.altitude AS altitude, Y(loc.location) AS latitude, X(loc.location) as longitude,
loc.speed as speed, loc.climb as climb, loc.track as track,
loc.longitude_error as longitude_error, loc.latitude_error as latitude_error, loc.altitude_error as altitude_error,
loc.speed_error as speed_error, loc.climb_error as climb_error, loc.track_error as track_error
FROM sensor_data dat
LEFT OUTER JOIN package_location loc on dat.package_id = loc.package_id
LEFT OUTER JOIN data_package pak ON dat.package_id = pak.id
LEFT OUTER JOIN relays rel ON pak.relay_id = rel.id
LEFT OUTER JOIN sensors sns ON dat.sensor_id = sns.id
LEFT OUTER JOIN sensor_types typ ON sns.sensor_type = typ.id
WHERE typ.name='Temperature'
AND rel.unique_id='OneWireTester'
AND pak.rtime > '2015-01-01'
AND pak.rtime < '2016-01-01'

还有解释...

+----+-------------+-------+--------+------------------------------------------+----------------------+---------+------------------------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+------------------------------------------+----------------------+---------+------------------------+------+----------------------------------------------------+
| 1 | SIMPLE | rel | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using where |
| 1 | SIMPLE | pak | ref | PRIMARY,fk_package_relay_id | fk_package_relay_id | 9 | BigSense.rel.id | 1 | Using index condition; Using where |
| 1 | SIMPLE | dat | ref | fk_sensor_package_id,fk_sensor_sensor_id | fk_sensor_package_id | 9 | BigSense.pak.id | 1 | NULL |
| 1 | SIMPLE | sns | eq_ref | PRIMARY,fk_sensors_type_id | PRIMARY | 8 | BigSense.dat.sensor_id | 1 | NULL |
| 1 | SIMPLE | loc | eq_ref | PRIMARY | PRIMARY | 8 | BigSense.pak.id | 1 | NULL |
| 1 | SIMPLE | typ | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+------------------------------------------+----------------------+---------+------------------------+------+----------------------------------------------------+

...看起来很简单。我需要在 relays 表和 sensor_types 上添加索引以优化查询。

PostgreSQL 版本的表几乎相同。但是,当我使用以下查询时:

SELECT 'UTC' AS timezone, pak.id AS package_id, rel.unique_id AS relay, sns.unique_id AS sensor, pak.rtime AS time,
sns.units AS sensor_units, typ.name AS sensor_type, dat.data AS sensor_data,
loc.altitude AS altitude, ST_Y(loc.location::geometry) AS latitude, ST_X(loc.location::geometry) as longitude,
loc.speed as speed, loc.climb as climb, loc.track as track,
loc.longitude_error as longitude_error, loc.latitude_error as latitude_error, loc.altitude_error as altitude_error,
loc.speed_error as speed_error, loc.climb_error as climb_error, loc.track_error as track_error
FROM sensor_data dat
LEFT OUTER JOIN package_location loc on dat.package_id = loc.package_id
LEFT OUTER JOIN data_package pak ON dat.package_id = pak.id
LEFT OUTER JOIN relays rel ON pak.relay_id = rel.id
LEFT OUTER JOIN sensors sns ON dat.sensor_id = sns.id
LEFT OUTER JOIN sensor_types typ ON sns.sensor_type = typ.id
WHERE typ.name='Temperature'
AND rel.unique_id='OneWireTester'
AND pak.rtime > '2015-01-01'
AND pak.rtime < '2016-01-01';

如果我进行解释分析,我会得到以下信息:

    QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=36.23..131.80 rows=1 width=477) (actual time=0.074..3.933 rows=76 loops=1)
-> Nested Loop (cost=36.09..131.60 rows=1 width=349) (actual time=0.068..3.782 rows=76 loops=1)
-> Nested Loop (cost=35.94..130.58 rows=4 width=267) (actual time=0.062..2.472 rows=620 loops=1)
-> Hash Join (cost=35.67..128.73 rows=4 width=247) (actual time=0.053..0.611 rows=620 loops=1)
Hash Cond: (dat.sensor_id = sns.id)
-> Seq Scan on sensor_data dat (cost=0.00..89.46 rows=946 width=21) (actual time=0.007..0.178 rows=1006 loops=1)
-> Hash (cost=35.64..35.64 rows=2 width=238) (actual time=0.037..0.037 rows=11 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Hash Join (cost=20.68..35.64 rows=2 width=238) (actual time=0.019..0.035 rows=11 loops=1)
Hash Cond: (sns.sensor_type = typ.id)
-> Seq Scan on sensors sns (cost=0.00..13.60 rows=360 width=188) (actual time=0.002..0.005 rows=31 loops=1)
-> Hash (cost=20.62..20.62 rows=4 width=66) (actual time=0.010..0.010 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on sensor_types typ (cost=0.00..20.62 rows=4 width=66) (actual time=0.006..0.008 rows=1 loops=1)
Filter: ((name)::text = 'Temperature'::text)
Rows Removed by Filter: 4
-> Index Scan using data_package_pkey on data_package pak (cost=0.28..0.45 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=620)
Index Cond: (id = dat.package_id)
Filter: ((rtime > '2015-01-01 00:00:00'::timestamp without time zone) AND (rtime < '2016-01-01 00:00:00'::timestamp without time zone))
-> Index Scan using relays_pkey on relays rel (cost=0.14..0.24 rows=1 width=94) (actual time=0.002..0.002 rows=0 loops=620)
Index Cond: (id = pak.relay_id)
Filter: ((unique_id)::text = 'OneWireTester'::text)
Rows Removed by Filter: 1
-> Index Scan using package_location_pkey on package_location loc (cost=0.14..0.18 rows=1 width=140) (actual time=0.001..0.001 rows=0 loops=76)
Index Cond: (dat.package_id = package_id)
Planning time: 0.959 ms
Execution time: 4.030 ms
(27 rows)

表模式具有相同的外键和一般结构,因此我希望看到所需的相同索引。但是,我一直在查看有关 pgsql 的检查语句的几个指南,从我收集到的内容来看,Seq Scan 语句是缺少索引的指标,这意味着我缺少 sensorssensor_data 上的索引sensor_type

我是否正确解释了这些检查语句的结果?为了优化这两个数据库,我应该寻找什么?

最佳答案

在 PostgreSQL(可能还有 MySQL)中,索引的使用不只是因为它们已被定义,而是在可以加快查询速度时使用。

EXPLAIN ANALYZE 输出中,您会在括号之间看到一个关于 cost 的部分,然后是一个关于 actual time 的类似部分。查询规划器会查看 cost,它由配置文件中列出的许多参数定义。这些成本是 IO 和 CPU 时间之类的东西,前者的值(value)通常比后者高得多(通常相差 100 倍)。这意味着查询规划器试图最小化需要从磁盘读取的数据量,这些数据按预先确定大小的页面(通常为 4kB),而不是按单个行(这是因为这允许更快的访问由于硬盘驱动器的物理特性)。表本身和索引都存储在磁盘上。如果表格很小,它可以放在几页中,甚至可能只有一页。由于 CPU 时间与 IO 时间相比便宜,因此顺序扫描几页比使用索引读取磁盘页面的额外 IO 要快得多。

正如您可以从 EXPLAIN ANALYZE 输出中看出的那样,您的大部分表格都很小,适合少数几页。如果你真的想测试索引的功能,你应该用一百万行左右的随机数据加载你的表,然后进行测试。

关于mysql - 了解类似 MySQL 和 PostgreSQL 数据库中的 EXPLAIN 语句,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31712450/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com