gpt4 book ai didi

postgresql - Postgres 低估了导致错误查询计划的行数

转载 作者:行者123 更新时间:2023-12-05 04:24:00 24 4
gpt4 key购买 nike

我有一个查询需要 2.5 秒才能运行。在检查查询计划时,我了解到 postgres 严重低估了导致嵌套循环的行数。
以下是查询

explain analyze 
SELECT
reprocessed_videos.video_id AS reprocessed_videos_video_id
FROM
reprocessed_videos
JOIN commit_info ON commit_info.id = reprocessed_videos.commit_id
WHERE
commit_info.tag = 'stop_sign_tbc_inertial_fix'
AND reprocessed_videos.reprocess_type_id = 28
AND reprocessed_videos.classification_crop_type_id = 0
AND reprocessed_videos.reprocess_status = 'success';

以下是解释分析输出。

  Nested Loop  (cost=0.84..22941.18 rows=1120 width=4) (actual time=31.169..2650.181 rows=179524 loops=1)
-> Index Scan using commit_info_tag_key on commit_info (cost=0.28..8.29 rows=1 width=4) (actual time=0.395..0.397 rows=1 loops=1)
Index Cond: ((tag)::text = 'stop_sign_tbc_inertial_fix'::text)
-> Index Scan using ix_reprocessed_videos_commit_id on reprocessed_videos (cost=0.56..22919.99 rows=1289 width=8) (actual time=30.770..2634.546 rows=179524 loops=1)
Index Cond: (commit_id = commit_info.id)
Filter: ((reprocess_type_id = 28) AND (classification_crop_type_id = 0) AND ((reprocess_status)::text = 'success'::text))
Rows Removed by Filter: 1190
Planning Time: 0.326 ms
Execution Time: 2657.724 ms


我们可以看到使用 ix_reprocessed_videos_commit_id 的索引扫描预计有 1289 行,而实际上有 179524 行。我试图找出造成这种情况的原因,但无论我尝试什么都没有成功。

以下是我尝试过的事情。

  1. 清理并分析所有涉及的表。 (可能有一点帮助,但帮助不大,因为表格会自动清理和分析)
  2. 增加 commit_id 列的统计计数 alter table reprocessed_videos alter column commit_id set statistics 1000;(有点帮助)
  3. 我阅读了有关扩展统计的内容,但不确定它们在这里是否有用。

以下是每个表中元组的数量

kpis=> SELECT relname, reltuples FROM pg_class where relname in ('reprocessed_videos', 'video_catalog', 'commit_info');
relname | reltuples
--------------------+---------------
commit_info | 1439
reprocessed_videos | 3.1563756e+07

以下是与表模式相关的一些信息

                                                 Table "public.reprocessed_videos"
Column | Type | Collation | Nullable | Default
-----------------------------+-----------------------------+-----------+----------+------------------------------------------------
id | integer | | not null | nextval('reprocessed_videos_id_seq'::regclass)
video_id | integer | | |
reprocess_status | character varying | | |
commit_id | integer | | |
reprocess_type_id | integer | | |
classification_crop_type_id | integer | | |
Indexes:
"reprocessed_videos_pkey" PRIMARY KEY, btree (id)
"ix_reprocessed_videos_commit_id" btree (commit_id)
"ix_reprocessed_videos_video_id" btree (video_id)
"reprocessed_videos_video_commit_reprocess_crop_key" UNIQUE CONSTRAINT, btree (video_id, commit_id, reprocess_type_id, classification_crop_type_id)
Foreign-key constraints:
"reprocessed_videos_commit_id_fkey" FOREIGN KEY (commit_id) REFERENCES commit_info(id)

                                         Table "public.commit_info"
Column | Type | Collation | Nullable | Default
------------------------+-------------------+-----------+----------+-----------------------------------------
id | integer | | not null | nextval('commit_info_id_seq'::regclass)
tag | character varying | | |
commit | character varying | | |

Indexes:
"commit_info_pkey" PRIMARY KEY, btree (id)
"commit_info_tag_key" UNIQUE CONSTRAINT, btree (tag)

我确信 postgres 在这种情况下不应该使用嵌套循环,而是因为错误的行估计而使用它们。非常感谢任何帮助。

以下是我尝试过的实验。

  1. 禁用索引扫描
 Nested Loop  (cost=734.59..84368.70 rows=1120 width=4) (actual time=274.694..934.965 rows=179524 loops=1)
-> Bitmap Heap Scan on commit_info (cost=4.29..8.30 rows=1 width=4) (actual time=0.441..0.444 rows=1 loops=1)
Recheck Cond: ((tag)::text = 'stop_sign_tbc_inertial_fix'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on commit_info_tag_key (cost=0.00..4.29 rows=1 width=0) (actual time=0.437..0.439 rows=1 loops=1)
Index Cond: ((tag)::text = 'stop_sign_tbc_inertial_fix'::text)
-> Bitmap Heap Scan on reprocessed_videos (cost=730.30..84347.51 rows=1289 width=8) (actual time=274.250..920.137 rows=179524 loops=1)
Recheck Cond: (commit_id = commit_info.id)
Filter: ((reprocess_type_id = 28) AND (classification_crop_type_id = 0) AND ((reprocess_status)::text = 'success'::text))
Rows Removed by Filter: 1190
Heap Blocks: exact=5881
-> Bitmap Index Scan on ix_reprocessed_videos_commit_id (cost=0.00..729.98 rows=25256 width=0) (actual time=273.534..273.534 rows=180714 loops=1)
Index Cond: (commit_id = commit_info.id)
Planning Time: 0.413 ms
Execution Time: 941.874 ms

我还为 commit_id 列设置了更新的统计信息。我观察到大约 3 倍的速度提升。

  1. 在尝试禁用 bitmapscan 时,查询执行顺序扫描并需要 19 秒才能运行

最佳答案

嵌套循环是完美的连接策略,因为commit_info 只有一行。任何其他加入策略都会失败。

问题是 reprocessed_videos 上的索引扫描是否真的太慢了​​。要进行实验,请在 SET enable_indexscan = off; 之后重试以获取位图索引扫描,看看是否更好。然后 SET enable_bitmapscan = off; 进行顺序扫描。我怀疑您当前的计划会获胜,但位图索引扫描很有可能。


如果位图索引扫描更好,你确实应该尝试改进估计:

ALTER TABLE reprocessed_videos ALTER commit_id SET STATISTICS 1000;
ANALYZE reprocessed_videos;

您可以尝试使用其他值;选择能给你足够好的估计的最低值。

要尝试的另一件事是扩展统计信息:

CREATE STATISTICS corr (dependencies)
ON (reprocess_type_id, classification_crop_type_id, reprocess_status)
FROM reprocessed_videos;

ANALYZE reprocessed_videos;

也许您甚至不需要其中的所有三列;玩它。


如果位图索引扫描没有提供足够的好处,有一种方法可以加快当前索引扫描:

CLUSTER reprocessed_videos USING ix_reprocessed_videos_commit_id;

这会按索引顺序重写表(并在运行时阻止并发访问,所以要小心!)。之后,索引扫描可能会快得多。但是,不会维护该顺序,因此如果已修改足够多的表,您将不得不偶尔重复 CLUSTER

关于postgresql - Postgres 低估了导致错误查询计划的行数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73581823/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com