gpt4 book ai didi

postgresql - postgres窗口函数三倍查询时间

转载 作者:行者123 更新时间:2023-11-29 13:44:59 30 4
gpt4 key购买 nike

我正在使用 postgres 10,并有以下查询

select 
count(task.id) over() as _total_ ,
json_agg(u.*) as users,
task.*

from task
left outer join taskuserlink_history tu on (task.id = tu.taskid)
left outer join "user" u on (tu.userId = u.id)

group by task.id offset 10 limit 10;

执行此查询大约需要 800 毫秒

如果我删除 count(task.id) over() as _total_ , 行,那么它会在 250 毫秒内执行

我不得不承认我是一个完全的 sql 菜鸟,所以查询本身可能完全无聊

我想知道是否有人可以指出查询中的缺陷,并就如何加快速度提出建议。

tasks 的数量约为 15k,每个任务平均有 5 个 users,通过 taskuserlink 链接

我看过 pgadmin“解释”图

enter image description here

但老实说还不能真正弄明白 ;)

表定义是

task ,以 id (int) 作为主列

taskuserlink_history,带有taskId(int)和userId(int)(都是外键约束,有索引)

user,以 id (int) 作为主列

查询计划如下

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4.74..12.49 rows=10 width=44) (actual time=1178.016..1178.043 rows=10 loops=1)
Buffers: shared hit=3731, temp read=6655 written=6914
-> WindowAgg (cost=4.74..10248.90 rows=13231 width=44) (actual time=1178.014..1178.040 rows=10 loops=1)
Buffers: shared hit=3731, temp read=6655 written=6914
-> GroupAggregate (cost=4.74..10083.51 rows=13231 width=36) (actual time=0.417..1049.294 rows=13255 loops=1)
Group Key: task.id
Buffers: shared hit=3731
-> Nested Loop Left Join (cost=4.74..9586.77 rows=66271 width=36) (actual time=0.103..309.372 rows=66162 loops=1)
Join Filter: (taskuserlink_history.userid = user_archive.id)
Rows Removed by Join Filter: 1182904
Buffers: shared hit=3731
-> Merge Left Join (cost=0.58..5563.22 rows=66271 width=8) (actual time=0.044..73.598 rows=66162 loops=1)
Merge Cond: (task.id = taskuserlink_history.taskid)
Buffers: shared hit=3629
-> Index Only Scan using task_pkey on task (cost=0.29..1938.30 rows=13231 width=4) (actual time=0.026..7.683 rows=13255 loops=1)
Heap Fetches: 13255
Buffers: shared hit=1810
-> Index Scan using taskuserlink_history_task_fk_idx on taskuserlink_history (cost=0.29..2764.46 rows=66271 width=8) (actual time=0.015..40.109 rows=66162 loops=1)
Filter: (timeend IS NULL)
Rows Removed by Filter: 13368
Buffers: shared hit=1819
-> Materialize (cost=4.17..50.46 rows=4 width=36) (actual time=0.000..0.001 rows=19 loops=66162)
Buffers: shared hit=102
-> Bitmap Heap Scan on user_archive (cost=4.17..50.44 rows=4 width=36) (actual time=0.050..0.305 rows=45 loops=1)
Recheck Cond: (archived_at IS NULL)
Heap Blocks: exact=11
Buffers: shared hit=102
-> Bitmap Index Scan on user_unique_username (cost=0.00..4.16 rows=4 width=0) (actual time=0.014..0.014 rows=46 loops=1)
Buffers: shared hit=1
SubPlan 1
-> Aggregate (cost=8.30..8.31 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=45)
Buffers: shared hit=90
-> Index Scan using task_assignedto_idx on task task_1 (cost=0.29..8.30 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=45)
Index Cond: (assignedtoid = user_archive.id)
Buffers: shared hit=90
Planning time: 0.989 ms
Execution time: 1191.451 ms
(37 rows)

没有窗口函数是

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4.74..12.36 rows=10 width=36) (actual time=0.510..1.763 rows=10 loops=1)
Buffers: shared hit=91
-> GroupAggregate (cost=4.74..10083.51 rows=13231 width=36) (actual time=0.509..1.759 rows=10 loops=1)
Group Key: task.id
Buffers: shared hit=91
-> Nested Loop Left Join (cost=4.74..9586.77 rows=66271 width=36) (actual time=0.073..0.744 rows=50 loops=1)
Join Filter: (taskuserlink_history.userid = user_archive.id)
Rows Removed by Join Filter: 361
Buffers: shared hit=91
-> Merge Left Join (cost=0.58..5563.22 rows=66271 width=8) (actual time=0.029..0.161 rows=50 loops=1)
Merge Cond: (task.id = taskuserlink_history.taskid)
Buffers: shared hit=7
-> Index Only Scan using task_pkey on task (cost=0.29..1938.30 rows=13231 width=4) (actual time=0.016..0.031 rows=11 loops=1)
Heap Fetches: 11
Buffers: shared hit=4
-> Index Scan using taskuserlink_history_task_fk_idx on taskuserlink_history (cost=0.29..2764.46 rows=66271 width=8) (actual time=0.009..0.081 rows=50 loops=1)
Filter: (timeend IS NULL)
Rows Removed by Filter: 11
Buffers: shared hit=3
-> Materialize (cost=4.17..50.46 rows=4 width=36) (actual time=0.001..0.009 rows=8 loops=50)
Buffers: shared hit=84
-> Bitmap Heap Scan on user_archive (cost=4.17..50.44 rows=4 width=36) (actual time=0.040..0.382 rows=38 loops=1)
Recheck Cond: (archived_at IS NULL)
Heap Blocks: exact=7
Buffers: shared hit=84
-> Bitmap Index Scan on user_unique_username (cost=0.00..4.16 rows=4 width=0) (actual time=0.012..0.012 rows=46 loops=1)
Buffers: shared hit=1
SubPlan 1
-> Aggregate (cost=8.30..8.31 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=38)
Buffers: shared hit=76
-> Index Scan using task_assignedto_idx on task task_1 (cost=0.29..8.30 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=38)
Index Cond: (assignedtoid = user_archive.id)
Buffers: shared hit=76
Planning time: 0.895 ms
Execution time: 1.890 ms
(35 rows)|

最佳答案

我相信 LIMIT 子句正在发挥作用。 LIMIT 限制返回的行数,不一定限制所涉及的工作:

  • 您的第二个查询可以在构建 20 行后提前中止(10 行用于 OFFSET,10 行用于 LIMIT)。
  • 但是,您的第一个查询需要遍历整个集合来计算计数(task.id)。

不是你问的,但我还是说了:

  • “用户”不是表,而是 View 。也就是说,这两个查询实际上都比应有的速度慢(计划中的“具体化”)。
  • 使用OFFSET进行寻呼是有问题的,因为OFFSET增大会变慢
  • 在没有 ORDER BY 的情况下使用 OFFSET 和 LIMIT 很可能不是您想要的。连续调用的结果集可能不相同。

关于postgresql - postgres窗口函数三倍查询时间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50061080/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com