gpt4 book ai didi

postgresql - 如果我使用另一个 ORDER BY,为什么 postgres 不再使用索引?

转载 作者:行者123 更新时间:2023-12-04 07:37:00 26 4
gpt4 key购买 nike

我没有在这里找到我的问题的答案,所以我想我会问:
我有下表,大约有 18kk 行:

# SELECT COUNT(1) from report;
count
----------
18090892
(1 row)

# \d report
Table "public.report"
Column | Type | Collation | Nullable | Default
-------------+--------------------------+-----------+----------+---------
reporter_id | uuid | | not null |
parsed | boolean | | not null |
id | text | | not null |
request_id | uuid | | |
created | timestamp with time zone | | not null | now()
customer | text | | not null |
subject | text | | |
Indexes:
"PK_99e4d0bea58cba73c57f935a546" PRIMARY KEY, btree (id)
"idx_report_created_desc" btree (created DESC)
"idx_report_reporter_id_asc_created_desc" btree (reporter_id, created DESC)
"idx_report_request_id_asc_created_desc" btree (request_id, created DESC)
Foreign-key constraints:
"FK_5b809608bb38d119333b69f65f9" FOREIGN KEY (request_id) REFERENCES request(id)
"FK_d41df66b60944992386ed47cf2e" FOREIGN KEY (reporter_id) REFERENCES reporter(id)
如果我使用 ORDER BY created DESC LIMIT 25使用索引:
# EXPLAIN ANALYZE SELECT * FROM report ORDER BY created DESC LIMIT 25;
QUERY PLAN

----------------------------------------------------------------------------------------------------------
--------------------------------------------
Limit (cost=0.44..2.49 rows=25 width=169) (actual time=0.035..0.063 rows=25 loops=1)
-> Index Scan using idx_report_created_desc on report (cost=0.44..1482912.16 rows=18090892 width=169)
(actual time=0.033..0.051 rows=25 loops=1)
Planning Time: 0.239 ms
Execution Time: 0.105 ms
(4 rows)
但是,如果我使用 ORDER BY created DESC, id ASC LIMIT 25 ,不再使用索引:
# EXPLAIN ANALYZE SELECT * FROM "report" ORDER BY "created" DESC, "id" ASC LIMIT 25;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=587891.07..587893.99 rows=25 width=169) (actual time=2719.606..2726.355 rows=25 loops=1)
-> Gather Merge (cost=587891.07..2346850.67 rows=15075744 width=169) (actual time=2711.873..2718.618 rows=25 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=586891.04..605735.72 rows=7537872 width=169) (actual time=2643.445..2643.448 rows=21 loops=3)
Sort Key: created DESC, id
Sort Method: top-N heapsort Memory: 35kB
Worker 0: Sort Method: top-N heapsort Memory: 32kB
Worker 1: Sort Method: top-N heapsort Memory: 31kB
-> Parallel Seq Scan on report (cost=0.00..374177.72 rows=7537872 width=169) (actual time=0.018..1910.204 rows=6030297 loops=3)
Planning Time: 0.396 ms
JIT:
Functions: 1
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 4.757 ms, Inlining 0.172 ms, Optimization 5.053 ms, Emission 2.003 ms, Total 11.985 ms
Execution Time: 2731.226 ms
(16 rows)
如果我理解正确,应该仍然使用索引,因为应该返回相同的结果集,只是可能以不同的顺序由 ORDER BY id ASC 确定。 .
所以我想知道为什么 postgres 决定进行并行 seq 扫描而不是使用索引,然后按 id 对返回的 25 行进行排序?这绝对应该比并行 seq 扫描快,不是吗?
或者我哪里错了?

最佳答案

PostgreSQL 并不是无限聪明。有些事情它没有弄清楚,即使它们在理论上是可能被弄清楚的。
但它一直在变得越来越聪明。升级到版本 13,看看会发生什么。它应该使用索引扫描加上非常快速的“增量排序”。增量排序只需要打破“创建”之间的联系,我假设这种情况很少见。

If I understand correctly, the index should still be used because the same set of results should be returned, only possibly in a different order determined by ORDER BY id ASC


但是在存在 LIMIT 的情况下,以不同的顺序返回结果意味着返回不同结果的可能性。因此,它必须采取特殊措施来解决这个问题。在 v13 中确实如此。

关于postgresql - 如果我使用另一个 ORDER BY,为什么 postgres 不再使用索引?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67690904/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com