gpt4 book ai didi

未使用的 Postgresql 索引

转载 作者:行者123 更新时间:2023-11-29 13:03:32 25 4
gpt4 key购买 nike

我有一个非常奇怪的数据集,其中大表中的几条记录根本没有任何数据,但当它们这样做时,它是百分之几万条记录。我试图只选择有数据的记录,但我在索引使用方面遇到了一些问题。我知道您通常不能“强制”postgresql 使用某些索引,但在这种情况下它可以工作。

SELECT matches.id, count(frames.id) FROM matches LEFT JOIN frames ON frames.match_id = matches.id GROUP BY matches.id HAVING count(frames.id) > 0 ORDER BY count(frames.id) DESC;
id | count
----+--------
31 | 123363
28 | 121475
24 | 110155
21 | 108258
22 | 106837
25 | 89182
26 | 87104
27 | 86152
(8 rows)

SELECT matches.id, count(frames.id) FROM matches LEFT JOIN frames ON frames.match_id = matches.id GROUP BY matches.id HAVING count(frames.id) = 0 ORDER BY count(frames.id) DESC;
....
(568 rows)

我找到的两个解决方案是:

SELECT "matches".* FROM "matches" WHERE EXISTS (SELECT true FROM frames WHERE frames.match_id = matches.id LIMIT 1);
Time: 11697,645 ms


or

SELECT DISTINCT "matches".* FROM "matches" INNER JOIN "frames" ON "frames"."match_id" = "matches"."id"
Time: 879,325 ms

这两个查询似乎都没有使用框架表中 match_id 的索引。这是可以理解的,因为通常它不是很有选择性,不幸的是在这里它真的很有帮助。作为:

SET enable_seqscan = OFF;
SELECT "matches".* FROM "matches" WHERE (SELECT true FROM frames WHERE frames.match_id = matches.id LIMIT 1);
Time: 1,239 ms

解释查询:

EXPLAIN for: SELECT DISTINCT "matches".* FROM "matches" INNER JOIN "frames" ON "frames"."match_id" = "matches"."id"

QUERY PLAN
-----------------------------------------------------------------------------
HashAggregate (cost=59253.47..59256.38 rows=290 width=155)
-> Hash Join (cost=6.26..33716.73 rows=785746 width=155)
Hash Cond: (frames.match_id = matches.id)
-> Seq Scan on frames (cost=0.00..22906.46 rows=785746 width=4)
-> Hash (cost=4.45..4.45 rows=145 width=155)
-> Seq Scan on matches (cost=0.00..4.45 rows=145 width=155)
(6 rows)

解释:SELECT "matches".* FROM "matches"WHERE (EXISTS (SELECT id FROM frames WHERE frames.match_id = matches.id LIMIT 1)) 查询计划


Seq Scan on matches  (cost=0.00..41.17 rows=72 width=155)
Filter: (SubPlan 1)
SubPlan 1
-> Limit (cost=0.00..0.25 rows=1 width=4)
-> Seq Scan on frames (cost=0.00..24870.83 rows=98218 width=4)
Filter: (match_id = matches.id)

(6 行)

SET enable_seqscan = OFF;

EXPLAIN SELECT "matches".* FROM "matches"WHERE (SELECT true FROM frames WHERE frames.match_id = matches.id LIMIT 1); 查询计划


Seq Scan on matches  (cost=10000000000.00..10000000118.37 rows=72 width=155)
Filter: (SubPlan 1)
SubPlan 1
-> Limit (cost=0.00..0.79 rows=1 width=0)
-> Index Scan using index_frames_on_match_id on frames (cost=0.00..81762.68 rows=104066 width=0)
Index Cond: (match_id = matches.id)

(6 行)

关于如何调整查询以在此处使用索引的任何建议?也许还有其他方法可以检查执行时间接近 1 毫秒的 recrs 是否存在,我从索引中退出然后是 11 秒?

附言。我确实运行了 ANALYZE、VACUM ANALYZE,以及所有通常建议的改进索引使用的步骤。

编辑感谢 David Aldridge 指出 LIMIT 1 可能实际上阻碍了我现在得到的查询规划器:

SELECT "matches".* FROM "matches" WHERE EXISTS (SELECT true FROM frames WHERE frames.match_id = matches.id);
Time: 163,803 ms

有了计划:

EXPLAIN SELECT "matches".* FROM "matches" WHERE EXISTS (SELECT true FROM frames WHERE frames.match_id = matches.id);
QUERY PLAN
------------------------------------------------------------------------------------
Nested Loop (cost=25455.58..25457.90 rows=8 width=155)
-> HashAggregate (cost=25455.58..25455.66 rows=8 width=4)
-> Seq Scan on frames (cost=0.00..23374.26 rows=832526 width=4)
-> Index Scan using matches_pkey on matches (cost=0.00..0.27 rows=1 width=155)
Index Cond: (id = frames.match_id)
(5 rows)

仅索引版本仍然慢 100 倍(可能是因为 Seq Scan + Hash Aggregate 在仍在执行的帧上)

最佳答案

在基于 EXISTS 的替代方案中,LIMIT 子句是多余的,但可能对优化器没有帮助。

尝试:

SELECT "matches".*
FROM "matches"
WHERE EXISTS (SELECT 1
FROM frames
WHERE frames.match_id = matches.id);

关于未使用的 Postgresql 索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21962337/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com