gpt4 book ai didi

sql - Postgresql 解释计划差异

转载 作者:行者123 更新时间:2023-11-29 11:28:58 24 4
gpt4 key购买 nike

这是我的第一篇文章......

我有一个查询花费的时间比我想要的要长(难道我们都不是!)取决于我在 WHERE 子句中输入的内容……它可能运行得更快。我想了解为什么查询计划不同和我可以做些什么来加快查询速度。

这是查询 #1:

SELECT date_observed, base_value 
FROM device_read_data
WHERE fk_device_rw_id IN
(SELECT fk_device_rw_id FROM equipment_set_rw
WHERE fk_equipment_set_id = CAST('ed151028-1fc0-11e3-b79f-47c0fd87d2b4' AS uuid))
AND date_observed
BETWEEN '2013-12-01 07:45:00+00'::timestamptz
AND '2014-01-01 07:59:59+00'::timestamptz
AND base_value ~ '[0-9]+(\.[0-9]+)?'
;

Here's Query Plan #1 :

"Hash Semi Join  (cost=11.65..5640243.59 rows=92194 width=16) (actual time=34.947..132522.023 rows=43609 loops=1)"
" Hash Cond: (device_read_data.fk_device_rw_id = equipment_set_rw.fk_device_rw_id)"
" -> Seq Scan on device_read_data (cost=0.00..5449563.56 rows=72157042 width=32) (actual time=0.844..123760.331 rows=71764376 loops=1)"
" Filter: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone) AND ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text))"
" Rows Removed by Filter: 82135660"
" -> Hash (cost=11.61..11.61 rows=3 width=16) (actual time=0.018..0.018 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..11.61 rows=3 width=16) (actual time=0.016..0.016 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=3 width=0) (actual time=0.011..0.011 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
"Total runtime: 132530.290 ms"

这是查询#2:

SELECT date_observed, base_value 
FROM device_read_data
WHERE fk_device_rw_id IN
(SELECT fk_device_rw_id FROM equipment_set_rw
WHERE fk_equipment_set_id = CAST('ed151028-1fc0-11e3-b79f-47c0fd87d2b4' AS uuid))
AND date_observed
BETWEEN '2014-01-01 07:45:00+00'::timestamptz
AND '2014-02-01 07:59:59+00'::timestamptz
AND base_value ~ '[0-9]+(\.[0-9]+)?'
;

Here's Query Plan #2 :

"Nested Loop  (cost=4.27..1869543.46 rows=20391 width=16) (actual time=0.041..2053.656 rows=12997 loops=1)"
" -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..9.73 rows=2 width=16) (actual time=0.015..0.017 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=2 width=0) (actual time=0.010..0.010 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Index Scan using idx_device_read_data_date_observed_fk_device_rw_id on device_read_data (cost=0.00..934664.91 rows=10195 width=32) (actual time=0.024..2050.656 rows=12997 loops=1)"
" Index Cond: ((date_observed >= '2014-01-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-02-01 07:59:59+00'::timestamp with time zone) AND (fk_device_rw_id = equipment_set_rw.fk_device_rw_id))"
" Filter: ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text)"
"Total runtime: 2055.068 ms"

我只更改了 Where 子句中的日期范围。您可以看到在查询 #1 中有对表的 Seq 扫描,而在查询 #2 中有索引扫描。

我正在尝试确定是什么原因造成的,但我似乎找不到答案。

附加信息

  • 在 (date_observed, fk_device_rw_id) 上有一个复合索引
  • 此表上从未有任何删除。不需要 Autovacuum。
  • 无论如何我都用吸尘器吸了 table ....但这没有效果。
  • 我已经重建了这张表的索引
  • 我已经分析了这张表
  • 该系统是 Prod 的副本,目前处于空闲状态

系统信息

  • 在 Linux 上运行 Postgres 9.2
  • 16GB 系统内存
  • Shared_Buffers 设置为 4GB

我还可以提供哪些其他信息?我确信我遗漏了一些东西。

感谢您的帮助。

编辑 1

我试过:设置 enable_seqscan = false

这里是解释计划的结果:

"Hash Semi Join  (cost=2566484.50..7008502.81 rows=92194 width=16) (actual  time=18587.453..182228.966 rows=43609 loops=1)"
" Hash Cond: (device_read_data.fk_device_rw_id = equipment_set_rw.fk_device_rw_id)"
" -> Bitmap Heap Scan on device_read_data (cost=2566472.85..6817822.78 rows=72157042 width=32) (actual time=18562.247..172074.048 rows=71764376 loops=1)"
" Recheck Cond: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone))"
" Rows Removed by Index Recheck: 2102"
" Filter: ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text)"
" Rows Removed by Filter: 12265137"
" -> Bitmap Index Scan on idx_device_read_data_date_observed_fk_device_rw_id (cost=0.00..2548433.59 rows=85430682 width=0) (actual time=18556.228..18556.228 rows=84029513 loops=1)"
" Index Cond: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone))"
" -> Hash (cost=11.61..11.61 rows=3 width=16) (actual time=16.134..16.134 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Bitmap Heap Scan on equipment_set_rw (cost=4.27..11.61 rows=3 width=16) (actual time=16.128..16.129 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=3 width=0) (actual time=16.116..16.116 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
"Total runtime: 182244.181 ms"

正如预测的那样,查询花费了更长的时间。是否有太多的记录可以使它更快?

我有什么选择?

谢谢。

编辑2

我尝试了重写方法。恐怕结果与原始结果相似。这是查询计划:

"Hash Join  (cost=11.65..6013386.19 rows=90835 width=16) (actual time=35.272..127965.785 rows=43609 loops=1)"
" Hash Cond: (a.fk_device_rw_id = b.fk_device_rw_id)"
" -> Seq Scan on device_read_data a (cost=0.00..5565898.74 rows=71450793 width=32) (actual time=13.050..119667.814 rows=71764376 loops=1)"
" Filter: ((date_observed >= '2013-12-01 07:45:00+00'::timestamp with time zone) AND (date_observed <= '2014-01-01 07:59:59+00'::timestamp with time zone) AND ((base_value)::text ~ '[0-9]+(\.[0-9]+)?'::text))"
" Rows Removed by Filter: 85426425"
" -> Hash (cost=11.61..11.61 rows=3 width=16) (actual time=0.018..0.018 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Bitmap Heap Scan on equipment_set_rw b (cost=4.27..11.61 rows=3 width=16) (actual time=0.015..0.016 rows=1 loops=1)"
" Recheck Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
" -> Bitmap Index Scan on uc_fk_equipment_set_id_fk_device_rw_id (cost=0.00..4.27 rows=3 width=0) (actual time=0.011..0.011 rows=1 loops=1)"
" Index Cond: (fk_equipment_set_id = 'ed151028-1fc0-11e3-b79f-47c0fd87d2b4'::uuid)"
"Total runtime: 127992.849 ms"

这似乎是一个简单的问题。从表中返回特定日期范围内的记录。考虑到我现有的系统架构,在性能受到不利影响之前,表中可以存在多少记录可能存在一个阈值。

除非有其他建议,否则我可能需要采用分区方法。

感谢迄今为止的帮助!

最佳答案

在您的第一个查询中,您的日期范围跨越了整整一个月,而不是在第二个查询中只是一天。第一个查询中的日期范围与 device_read_data 中约 1.54 亿行中的 7200 万行匹配,这几乎是该表中行的一半。

对于那么多行,索引扫描通常比全表扫描慢(因为索引扫描必须读取索引页和数据页,获得那么多行所需的磁盘读取总数可能大于仅读取每个数据页)。

您可以 set enable_seq_scan = false在运行第一个查询以查看差异之前,如果您喜欢冒险,请将您的解释运行为 explain (analyze, buffers) <query>查看在执行表扫描与索引扫描时获得了多少 block 读取。

编辑:对于您的特定问题,使用部分索引可能会有些运气。您必须弄清楚如何构建它们,以便它们尽可能广地撒网(为每个问题编写部分索引很诱人但很浪费),但您可以从这样的事情开始:

create index idx_device_read_data_date_observed_base_value
on device_read_data (date_observed)
where base_value ~ '[0-9]+(\.[0-9]+)?'
;

该索引只会为那些匹配 base_value 的行建立图案。如果这是一个相当严格的条件,您会比我们更清楚(如果它确实减少了要考虑的行数,这对您有好处)。

您也可以翻转该想法并在与该模式匹配的 base_value 上建立索引,并使您的 where 条件类似于 date_observed between '2013-12-01 and '2013-12-31' ,每个月添加一个这样的索引(这种方式很可能会失去对索引的控制——我会切换到分区)。

另一个潜在的改进可能来自重写您的查询。这是一种消除 IN 的方法条件,如果没有重复 fk_device_rw_id,它会提供相同的结果在equipment_set_rw对于给定的 fk_equipment_set_id .

SELECT a.date_observed, a.base_value 
FROM device_read_data a
join equipment_set_rw b
on a.fk_device_rw_id = b.fk_device_rw_id
WHERE b.fk_equipment_set_id = CAST('ed151028-1fc0-11e3-b79f-47c0fd87d2b4' AS uuid)
AND a.date_observed BETWEEN '2014-01-01 07:45:00+00'::timestamptz
AND '2014-02-01 07:59:59+00'::timestamptz
AND a.base_value ~ '[0-9]+(\.[0-9]+)?'
;

关于sql - Postgresql 解释计划差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21100263/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com