gpt4 book ai didi

postgresql - 这个self-join可以进一步优化吗?

转载 作者:行者123 更新时间:2023-11-29 14:26:12 26 4
gpt4 key购买 nike

我正在尝试了解是否可以优化包含自连接的查询,如果可能 - 如何优化。

我正在处理一个更大的现实生活任务,但在这里我从中提取了一个简单的子任务以专注于一个特定问题:优化自连接查询。

我有一个名为parties 的表。它包含超过 85k 条记录,如下所示:

# \d test.parties
Table "test.parties"
Column | Type | Collation | Nullable | Default
-------------+------+-----------+----------+---------
id | uuid | | |
contract_id | uuid | | |

contract_id 上进行自连接我得到了这个计划:

# explain analyse select p1.id from test.parties p1 join test.parties p2 on p1.contract_id = p2.contract_id;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=20207.87..628157.87 rows=40500000 width=16) (actual time=109.709..184.523 rows=197632 loops=1)
Merge Cond: (p1.contract_id = p2.contract_id)
-> Sort (cost=11181.94..11406.94 rows=90000 width=32) (actual time=55.560..66.173 rows=86332 loops=1)
Sort Key: p1.contract_id
Sort Method: external merge Disk: 3560kB
-> Seq Scan on parties p1 (cost=0.00..1620.00 rows=90000 width=32) (actual time=0.018..14.518 rows=86332 loops=1)
-> Sort (cost=9025.94..9250.94 rows=90000 width=16) (actual time=54.135..74.973 rows=197631 loops=1)
Sort Key: p2.contract_id
Sort Method: external sort Disk: 2544kB
-> Seq Scan on parties p2 (cost=0.00..1620.00 rows=90000 width=16) (actual time=0.009..10.462 rows=86332 loops=1)
Planning Time: 0.167 ms
Execution Time: 199.677 ms
(12 rows)

contract_id 上添加索引我得到了这个计划:

# create index on test.parties(contract_id);
CREATE INDEX
# explain analyse select p1.id from test.parties p1 join test.parties p2 on p1.contract_id = p2.contract_id;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3084.47..10570.76 rows=192484 width=16) (actual time=32.457..97.662 rows=197632 loops=1)
Hash Cond: (p1.contract_id = p2.contract_id)
-> Seq Scan on parties p1 (cost=0.00..1583.32 rows=86332 width=32) (actual time=0.013..11.293 rows=86332 loops=1)
-> Hash (cost=1583.32..1583.32 rows=86332 width=16) (actual time=32.133..32.133 rows=86332 loops=1)
Buckets: 131072 Batches: 2 Memory Usage: 3048kB
-> Seq Scan on parties p2 (cost=0.00..1583.32 rows=86332 width=16) (actual time=0.007..12.815 rows=86332 loops=1)
Planning Time: 0.444 ms
Execution Time: 110.692 ms
(8 rows)

有没有办法摆脱那些Seq Scan

最佳答案

我在你的解释计划中没有看到任何索引的存在,所以指定你还没有研究使用索引,这是一个建议:

CREATE INDEX idx ON parties (contract_id, id);

这应该会加快连接速度,并且它还涵盖 SELECT 子句中所需的 id 值。

关于postgresql - 这个self-join可以进一步优化吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57593355/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com