gpt4 book ai didi

sql - Postgres : why LEFT JOIN affects to query plan?

转载 作者:行者123 更新时间:2023-11-29 11:29:36 26 4
gpt4 key购买 nike

我有 PostgreSQL 9.5.9 和两个表:table1 和 table2

 Column   |              Type              |                 Modifiers                 
------------+--------------------------------+-------------------------------------------
id | integer | not null
status | integer | not null
table2_id | integer |
start_date | timestamp(0) without time zone | default NULL::timestamp without time zone
Indexes:
"table1_pkey" PRIMARY KEY, btree (id)
"table1_start_date" btree (start_date)
"table1_table2" btree (table2_id)
Foreign-key constraints:
"fk_t1_t2" FOREIGN KEY (table2_id) REFERENCES table2(id)


Column | Type | Modifiers
--------+-------------------------+---------------------------------
id | integer | not null
name | character varying(2000) | default NULL::character varying
Indexes:
"table2_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "table1" CONSTRAINT "fk_t1_t2" FOREIGN KEY (table2_id) REFERENCES table2(id)

table2 仅包含 3 行; table1 包含大约 400000 行,其中只有一半在 table_2_id 列中有一些值。

当我从按 start_date 列排序的 table1 中选择一些值时,查询速度足够快,因为有效地使用了 table1_start_date 索引:

SELECT t1.*
FROM table1 AS t1
ORDER BY t1.start_date DESC
LIMIT 25 OFFSET 150000;

EXPLAIN ANALYZE 结果

   Limit  (cost=7797.40..7798.70 rows=25 width=20) (actual time=40.994..41.006 rows=25 loops=1)
-> Index Scan Backward using table1_start_date on table1 t1 (cost=0.42..20439.74 rows=393216 width=20) (actual time=0.078..36.251 rows=150025
loops=1)
Planning time: 0.097 ms
Execution time: 41.033 ms

但是当我添加 LEFT JOIN 以从 table2 获取值时,查询变得非常慢:

SELECT t1.*, t2.*
FROM table1 AS t1
LEFT JOIN table2 AS t2 ON t2.id = t1.table2_id
ORDER BY t1.start_date DESC
LIMIT 25 OFFSET 150000;

EXPLAIN ANALYZE 结果:

 Limit  (cost=33690.80..33696.42 rows=25 width=540) (actual time=191.282..191.320 rows=25 loops=1)
-> Nested Loop Left Join (cost=0.57..88317.50 rows=393216 width=540) (actual time=0.028..184.537 rows=150025 loops=1)
-> Index Scan Backward using table1_start_date on table1 t1 (cost=0.42..20439.74 rows=393216 width=20) (actual time=0.018..35.196 rows=
150025 loops=1)
-> Index Scan using table2_pkey on table2 t2 (cost=0.14..0.16 rows=1 width=520) (actual time=0.000..0.001 rows=1 loops=150025)
Index Cond: (id = t1.table2_id)
Planning time: 0.210 ms
Execution time: 191.357 ms

为什么查询时间从 32 毫秒增加到 191 毫秒?据我了解,LEFT JOIN 不会影响结果。因此,我们可以先从 table1 (LIMIT 25) 中选择 25 行,然后连接 table2 中的行查询的执行时间不会显着增加。没有一些棘手的条件可以破坏索引等的使用。

我不完全理解第二个查询的 EXPLAIN ANALYZE,但 postgres 分析器似乎决定“执行连接然后过滤”而不是“过滤然后连接”。这样查询就太慢了。有什么问题?

最佳答案

它只是不知道 limit 应该应用于 table1 而不是 join 的结果,所以它获取最少需要的行,即 150025 然后执行 150025 次循环在 table2 上。如果您对 table1 进行限制子选择,并将 table2 加入该子选择,您应该会得到想要的结果。

SELECT t1.*, t2.*
FROM (SELECT *
FROM table1
ORDER BY start_date DESC
LIMIT 25 OFFSET 150000) AS t1
LEFT JOIN table2 AS t2 ON t2.id = t1.table2_id;

关于sql - Postgres : why LEFT JOIN affects to query plan?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47432387/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com