gpt4 book ai didi

postgresql - 当 Inner Select Index Cond 更快时,Postgres 使用 Hash Join 和 Seq Scan

转载 作者:行者123 更新时间:2023-12-03 08:32:17 25 4
gpt4 key购买 nike

当索引可用时,Postgres 在表跟踪上使用更重的 Seq Scan。第一个查询是最初的尝试,它使用顺序扫描,因此查询速度很慢。我尝试使用内部选择强制执行索引扫描,但 postgres 将其转换回具有几乎相同运行时间的有效相同查询。我最终从查询二的内部选择中复制了列表以进行第三个查询。最后 postgres 使用了索引扫描,这大大减少了运行时间。第三个查询在生产环境中不可行。什么会导致 postgres 使用最后一个查询计划?

(两个表都使用了真空)

表格

  • 跟踪(worker_id、localdatetime)总记录数:118664105
  • project_worker(id,project_id)总记录:12935

索引

  • 使用 btree (worker_id, localdatetime) 在 public.tracking 上创建索引tracking_worker_id_localdatetime_idx

查询

从跟踪中选择worker_id、localdatetime t JOIN project_worker pw ON t.worker_id = pw.id WHERE project_id = 68475018

Hash Join  (cost=29185.80..2638162.26 rows=19294218 width=16) (actual time=16.912..18376.032 rows=177681 loops=1)
Hash Cond: (t.worker_id = pw.id)
-> Seq Scan on tracking t (cost=0.00..2297293.86 rows=118716186 width=16) (actual time=0.004..8242.891 rows=118674660 loops=1)
-> Hash (cost=29134.80..29134.80 rows=4080 width=8) (actual time=16.855..16.855 rows=2102 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 115kB
-> Seq Scan on project_worker pw (cost=0.00..29134.80 rows=4080 width=8) (actual time=0.004..16.596 rows=2102 loops=1)
Filter: (project_id = 68475018)
Rows Removed by Filter: 10833
Planning Time: 0.192 ms
Execution Time: 18382.698 ms

从跟踪中选择worker_id、localdatetime,其中worker_id IN(从project_worker中选择id,WHERE project_id = 68475018 LIMIT 500)

Hash Semi Join  (cost=6905.32..2923969.14 rows=27733254 width=24) (actual time=19.715..20191.517 rows=20530 loops=1)
Hash Cond: (t.worker_id = project_worker.id)
-> Seq Scan on tracking t (cost=0.00..2296948.27 rows=118698327 width=24) (actual time=0.005..9184.676 rows=118657026 loops=1)
-> Hash (cost=6899.07..6899.07 rows=500 width=8) (actual time=1.103..1.103 rows=500 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 28kB
-> Limit (cost=0.00..6894.07 rows=500 width=8) (actual time=0.006..1.011 rows=500 loops=1)
-> Seq Scan on project_worker (cost=0.00..28982.65 rows=2102 width=8) (actual time=0.005..0.968 rows=500 loops=1)
Filter: (project_id = 68475018)
Rows Removed by Filter: 4493
Planning Time: 0.224 ms
Execution Time: 20192.421 ms

从跟踪中选择worker_id、localdatetime,其中worker_id IN (322016383,316007840,...,285702579)

Index Scan using tracking_worker_id_localdatetime_idx on tracking t  (cost=0.57..4766798.31 rows=21877360 width=24) (actual time=0.079..29.756 rows=22112 loops=1)
" Index Cond: (worker_id = ANY ('{322016383,316007840,...,285702579}'::bigint[]))"
Planning Time: 1.162 ms
Execution Time: 30.884 ms

... 代替查询中使用的 500 个 id 条目

同一查询在另一组 500 个 ID 上运行

Index Scan using tracking_worker_id_localdatetime_idx on tracking t  (cost=0.57..4776714.91 rows=21900980 width=24) (actual time=0.105..5528.109 rows=117838 loops=1)
" Index Cond: (worker_id = ANY ('{286237712,286237844,...,216724213}'::bigint[]))"
Planning Time: 2.105 ms
Execution Time: 5534.948 ms

最佳答案

如果您想将 PostgreSQL 推向嵌套循环连接,请尝试以下操作:

  • 跟踪上创建可用于仅索引扫描的索引:

    CREATE INDEX ON tracking (worker_id) INCLUDE (localdatetime);

    确保经常对跟踪进行VACUUM编辑,以便仅索引扫描有效。

  • 减少random_page_cost并增加 effective_cache_size,以便优化器降低索引扫描价格(但不要使用疯狂的值)。

  • 确保您对 project_worker 有良好的估计:

    ALTER TABLE project_worker ALTER project_id SET STATISTICS 1000;
    ANALYZE project_worker;

关于postgresql - 当 Inner Select Index Cond 更快时,Postgres 使用 Hash Join 和 Seq Scan,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64808904/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com