gpt4 book ai didi

sql - PostgreSQL 解释计划中的成本度量有多可靠?

转载 作者:行者123 更新时间:2023-11-29 11:29:27 25 4
gpt4 key购买 nike

查询是在一个有 1100 万行的大表上执行的。在执行查询之前,我已经在表上执行了 ANALYZE

查询 1:

SELECT *
FROM accounts t1
LEFT OUTER JOIN accounts t2
ON (t1.account_no = t2.account_no
AND t1.effective_date < t2.effective_date)
WHERE t2.account_no IS NULL;

解释分析:

Hash Anti Join  (cost=480795.57..1201111.40 rows=7369854 width=292) (actual time=29619.499..115662.111 rows=1977871 loops=1)
Hash Cond: ((t1.account_no)::text = (t2.account_no)::text)
Join Filter: ((t1.effective_date)::text < (t2.effective_date)::text)
-> Seq Scan on accounts t1 (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.025..25693.921 rows=11034070 loops=1)
-> Hash (cost=342610.81..342610.81 rows=11054781 width=146) (actual time=29612.925..29612.925 rows=11034070 loops=1)
Buckets: 2097152 Batches: 1 Memory Usage: 1834187kB
-> Seq Scan on accounts t2 (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.006..22929.635 rows=11034070 loops=1)
Total runtime: 115870.788 ms

预计费用为~120万,实际花费时间为~1.9分钟

查询 2:

SELECT t1.*
FROM accounts t1
LEFT OUTER JOIN accounts t2
ON (t1.account_no = t2.account_no
AND t1.effective_date < t2.effective_date)
WHERE t2.account_no IS NULL;

解释分析:

Hash Anti Join  (cost=480795.57..1201111.40 rows=7369854 width=146) (actual time=13365.808..65519.402 rows=1977871 loops=1)
Hash Cond: ((t1.account_no)::text = (t2.account_no)::text)
Join Filter: ((t1.effective_date)::text < (t2.effective_date)::text)
-> Seq Scan on accounts t1 (cost=0.00..342610.81 rows=11054781 width=146) (actual time=0.007..5032.778 rows=11034070 loops=1)
-> Hash (cost=342610.81..342610.81 rows=11054781 width=18) (actual time=13354.219..13354.219 rows=11034070 loops=1)
Buckets: 2097152 Batches: 1 Memory Usage: 545369kB
-> Seq Scan on accounts t2 (cost=0.00..342610.81 rows=11054781 width=18) (actual time=0.011..8964.571 rows=11034070 loops=1)
Total runtime: 65705.707 ms

估计成本为~120 万(再次),但实际花费的时间为<1.1 分钟

查询 3:

SELECT *
FROM accounts
WHERE (account_no,
effective_date) IN
(SELECT account_no,
max(effective_date)
FROM accounts
GROUP BY account_no);

解释分析:

Nested Loop  (cost=406416.19..502216.84 rows=2763695 width=146) (actual time=31779.457..917543.228 rows=1977871 loops=1)
-> HashAggregate (cost=406416.19..406757.45 rows=34126 width=43) (actual time=31774.877..33378.968 rows=1977425 loops=1)
-> Subquery Scan on "ANY_subquery" (cost=397884.72..404709.90 rows=341259 width=43) (actual time=27979.226..29841.217 rows=1977425 loops=1)
-> HashAggregate (cost=397884.72..401297.31 rows=341259 width=18) (actual time=27979.224..29315.346 rows=1977425 loops=1)
-> Seq Scan on accounts (cost=0.00..342610.81 rows=11054781 width=18) (actual time=0.851..16092.755 rows=11034070 loops=1)
-> Index Scan using accounts_idx2 on accounts (cost=0.00..2.78 rows=1 width=146) (actual time=0.443..0.445 rows=1 loops=1977425)
Index Cond: (((account_no)::text = ("ANY_subquery".account_no)::text) AND ((effective_date)::text = "ANY_subquery".max))
Total runtime: 918039.614 ms

预计费用为~502,000,但实际花费的时间为~15.3 分钟!

  • EXPLAIN 输出的可靠性如何?
  • 我们是否总是需要EXPLAIN ANALYZE 来了解我们的查询将如何在真实数据上执行,并且不信任查询规划器有多少认为会花钱?

最佳答案

它们是可靠的,除非它们不可靠。你不能一概而论。

看起来它大大低估了它将找到的不同 account_no 的数量(认为它会找到 34126,实际上找到了 1977425)。您的 default_statistics_target 可能不够高,无法对此列进行良好的估计。

关于sql - PostgreSQL 解释计划中的成本度量有多可靠?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21154780/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com