gpt4 book ai didi

sql - 为什么左连接会导致优化器忽略索引?

转载 作者:行者123 更新时间:2023-11-29 11:22:31 25 4
gpt4 key购买 nike

使用 postgres 9.6.11,我的架构如下:

所有者:

id: BIGINT (PK)
dog_id: BIGINT NOT NULL (FK)
cat_id: BIGINT NULL (FK)

index DOG_ID_IDX (dog_id)
index CAT_ID_IDX (cat_id)

动物:

id: BIGINT (PK)
name: VARCHAR(50) NOT NULL

index NAME_IDX (name)

在一些示例数据中:

所有者表:

| id | dog_id | cat_id |
| -- | ------ | ------ |
| 1 | 100 | 200 |
| 2 | 101 | NULL |

动物表:

| id  | name     |
| --- | -------- |
| 100 | "fluffy" |
| 101 | "rex" |
| 200 | "tom" |

我需要执行的一个常见查询是通过宠物名称查找主人,我想通过如下查询来完成:

select *
from owner o
join animal dog on o.dog_id = dog.id
left join animal cat on o.cat_id = cat.id
where dog.name = "fluffy" or cat.name = "fluffy";

但是我不明白从中得到的计划:

Hash Join  (cost=30304.51..77508.31 rows=3 width=899)
Hash Cond: (dog.id = owner.dog_id)
Join Filter: (((dog.name)::text = 'fluffy'::text) OR ((cat.name)::text = 'fluffy'::text))
-> Seq Scan on animal dog (cost=0.00..17961.23 rows=116623 width=899)
-> Hash (cost=28208.65..28208.65 rows=114149 width=19)
-> Hash Left Join (cost=20103.02..28208.65 rows=114149 width=19)
Hash Cond: (owner.cat_id = cat.id)
-> Seq Scan on owner o (cost=0.00..5849.49 rows=114149 width=16)
-> Hash (cost=17961.23..17961.23 rows=116623 width=19)
-> Seq Scan on animal cat (cost=0.00..17961.23 rows=116623 width=19)

我不明白为什么查询计划要进行顺序扫描。我认为优化器会足够聪明,使用 name 索引扫描一次 animal 表,甚至两次,然后根据这个结果连接回 owner 表,但我最终得到了一个非常出乎意料的查询计划。

我举了一个更简单的例子,我们只想查找狗的名字,查询的行为符合我的预期:

select *
from owner o
join animal dog on o.dog_id = dog.id
where dog.name = "fluffy";

此查询生成一个我理解的计划,使用 animal.name 上的索引:

Nested Loop  (cost=0.83..16.88 rows=1 width=1346)
-> Index Scan using DOG_ID_IDX on animal dog (cost=0.42..8.44 rows=1 width=899)
Index Cond: ((name)::text = 'fluffy'::text)
-> Index Scan using dog_id on owner o (cost=0.42..8.44 rows=1 width=447)
Index Cond: (dog_id = b.id)

即使使用两个内部联接进行查询也会生成我期望的查询计划:

select * 
from owner o
join animal dog on o.dog_id = dog.id
join animal cat on o.cat_id = cat.id
where dog.name = 'fluffy' or cat.name = 'fluffy';
Merge Join  (cost=35726.09..56215.53 rows=3 width=2245)
Merge Cond: (owner.cat_id = cat.id)
Join Filter: (((dog.name)::text = 'fluffy'::text) OR ((cat.name)::text = 'fluffy'::text))
-> Nested Loop (cost=0.83..132348.38 rows=114149 width=1346)
-> Index Scan using CAT_ID_IDX on owner o (cost=0.42..11616.07 rows=114149 width=447)
-> Index Scan using animal_pkey on animal dog (cost=0.42..1.05 rows=1 width=899)
Index Cond: (id = owner.dog_id)
-> Index Scan using animal_pkey on animal cat (cost=0.42..52636.91 rows=116623 width=899)

所以看起来 animal 的左连接导致优化器忽略索引。

为什么对 animal 进行额外的左连接似乎会导致优化器忽略索引?

编辑:EXPLAIN(分析、缓冲)产量:

Hash Left Join  (cost=32631.95..150357.57 rows=3 width=2245) (actual time=6696.935..6696.936 rows=0 loops=1)
Hash Cond: (o.cat_id = cat.id)
Filter: (((dog.name)::text = 'fluffy'::text) OR ((cat.name)::text = 'fluffy'::text))
Rows Removed by Filter: 114219
Buffers: shared hit=170464 read=18028 dirtied=28, temp read=13210 written=13148
-> Merge Join (cost=0.94..65696.37 rows=114149 width=1346) (actual time=1.821..860.643 rows=114219 loops=1)
Merge Cond: (o.dog_id = dog.id)
Buffers: shared hit=170286 read=1408 dirtied=28
-> Index Scan using DOG_ID_IDX on owner o (cost=0.42..11402.48 rows=114149 width=447) (actual time=1.806..334.431 rows=114219 loops=1)
Buffers: shared hit=84787 read=783 dirtied=13
-> Index Scan using animal_pkey on animal dog (cost=0.42..52636.91 rows=116623 width=899) (actual time=0.006..300.507 rows=116977 loops=1)
Buffers: shared hit=85499 read=625 dirtied=15
-> Hash (cost=17961.23..17961.23 rows=116623 width=899) (actual time=5626.780..5626.780 rows=116977 loops=1)
Buckets: 8192 Batches: 32 Memory Usage: 3442kB
Buffers: shared hit=175 read=16620, temp written=12701
-> Seq Scan on animal cat (cost=0.00..17961.23 rows=116623 width=899) (actual time=2.519..5242.106 rows=116977 loops=1)
Buffers: shared hit=175 read=16620
Planning time: 1.245 ms
Execution time: 6697.357 ms

最佳答案

left join 需要保留第一个表中的所有行。因此,它通常会扫描该表,甚至 where 条件也会根据这些条件过滤其他表。

Postgres 生成的查询计划并不奇怪。

关于sql - 为什么左连接会导致优化器忽略索引?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55816269/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com