gpt4 book ai didi

sql - 如何调整 postgres 中的 index_scan 成本?

转载 作者:行者123 更新时间:2023-11-29 11:49:51 25 4
gpt4 key购买 nike

对于以下查询:

SELECT *
FROM "routes_trackpoint"
WHERE "routes_trackpoint"."track_id" = 593
ORDER BY "routes_trackpoint"."id" ASC
LIMIT 1;

Postgres 正在选择一个查询计划,该计划读取“id”索引中的所有行以执行排序,并执行手动过滤以获取具有正确轨道 ID 的条目:

Limit  (cost=0.43..511.22 rows=1 width=65) (actual time=4797.964..4797.966 rows=1 loops=1)
Buffers: shared hit=3388505
-> Index Scan using routes_trackpoint_pkey on routes_trackpoint (cost=0.43..719699.79 rows=1409 width=65) (actual time=4797.958..4797.958 rows=1 loops=1)
Filter: (track_id = 75934)
Rows Removed by Filter: 13005436
Buffers: shared hit=3388505
Total runtime: 4798.019 ms
(7 rows)

禁用索引扫描,查询计划(SET enable_indexscan=OFF;)更好,响应更快。

Limit  (cost=6242.46..6242.46 rows=1 width=65) (actual time=77.584..77.586 rows=1 loops=1)
Buffers: shared hit=1075 read=6
-> Sort (cost=6242.46..6246.64 rows=1674 width=65) (actual time=77.577..77.577 rows=1 loops=1)
Sort Key: id
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=1075 read=6
-> Bitmap Heap Scan on routes_trackpoint (cost=53.41..6234.09 rows=1674 width=65) (actual time=70.384..74.782 rows=1454 loops=1)
Recheck Cond: (track_id = 75934)
Buffers: shared hit=1075 read=6
-> Bitmap Index Scan on routes_trackpoint_track_id (cost=0.00..52.99 rows=1674 width=0) (actual time=70.206..70.206 rows=1454 loops=1)
Index Cond: (track_id = 75934)
Buffers: shared hit=2 read=6
Total runtime: 77.655 ms
(13 rows)

如何让 Postgres 自动选择更好的计划?

我尝试了以下方法:

ALTER TABLE routes_trackpoint ALTER COLUMN track_id SET STATISTICS 5000;
ALTER TABLE routes_trackpoint ALTER COLUMN id SET STATISTICS 5000;
ANALYZE routes_trackpoint;

但查询计划保持不变。

表定义是:

watchdog2=# \d routes_trackpoint
Table "public.routes_trackpoint"
Column | Type | Modifiers
----------+--------------------------+----------------------------------------------------------------
id | integer | not null default nextval('routes_trackpoint_id_seq'::regclass)
track_id | integer | not null
position | geometry(Point,4326) | not null
speed | double precision | not null
bearing | double precision | not null
valid | boolean | not null
created | timestamp with time zone | not null
Indexes:
"routes_trackpoint_pkey" PRIMARY KEY, btree (id)
"routes_trackpoint_position_id" gist ("position")
"routes_trackpoint_track_id" btree (track_id)
Foreign-key constraints:
"track_id_refs_id_d59447ae" FOREIGN KEY (track_id) REFERENCES routes_track(id) DEFERRABLE INITIALLY DEFERRED

PS:我们强制 postgres 按“创建”排序,这也有助于他使用“track_id”上的索引。

最佳答案

尽可能避免LIMIT。计划 #1:使用 NOT EXISTS() 获取第一个

EXPLAIN ANALYZE
SELECT * FROM routes_trackpoint tp
WHERE tp.track_id = 593
AND NOT EXISTS (
SELECT * FROM routes_trackpoint nx
WHERE nx.track_id = tp.track_id AND nx.id < tp.id
);

计划 #2:使用 row_number() OVER some_window 获取组中的第一个。

EXPLAIN ANALYZE
SELECT tp.*
FROM routes_trackpoint tp
JOIN (select track_id, id
, row_number() OVER (partition BY track_id ORDER BY id) rn
FROM routes_trackpoint tp2
) omg ON omg.id = tp.id
WHERE tp.track_id = 593
AND omg.rn = 1
;

或者 - 甚至更好 - 将 WHERE 子句移动到子查询:

EXPLAIN ANALYZE
SELECT tp.*
FROM routes_trackpoint tp
JOIN (select track_id, id
, row_number() OVER (partition BY track_id ORDER BY id) rn
FROM routes_trackpoint tp2
WHERE tp2.track_id = 593
) omg ON omg.id = tp.id
WHERE 1=1
-- AND tp.track_id = 593
AND omg.rn = 1
;

计划#3 使用 postgres 特定的 DISTINCT ON() 构造(感谢@a_horse_with_no_name):

-- EXPLAIN ANALYZE
SELECT DISTINCT ON (track_id) track_id, id
FROM routes_trackpoint tp2
WHERE tp2.track_id = 593
-- order by track_id, created desc
order by track_id, id
;

关于sql - 如何调整 postgres 中的 index_scan 成本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33856163/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com