gpt4 book ai didi

performance - Postgres : Why is the performance so bad on subselects with Offset/Limit

转载 作者:行者123 更新时间:2023-11-29 11:47:46 24 4
gpt4 key购买 nike

你能帮我理解这些语句之间性能下降的原因吗?

对我来说,在 D & E 的情况下,他首先将地址加入所有订阅者,最后应用 Offset & Limit。他到底为什么要这么做?

我是否遗漏了有关 Subselects 和 Offset 如何协同工作的信息?他不应该先找到正确的偏移量然后开始执行子选择吗?

user_idaddress_id 是主键

选择 A:15 毫秒(确定):选择前 200 个订阅者

SELECT s.user_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 0 LIMIT 200

选择 B:45 毫秒(确定):选择最后 200 个订阅者

SELECT s.user_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200

选择 C:15 毫秒(确定):选择前 200 个订阅者以及第一个可用地址

SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 0 LIMIT 200

选择 D:500 毫秒(不正确):选择最后 200 个订阅者以及第一个可用地址

SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200

选择 E:1000 毫秒(甚至更糟):选择最后 200 个订阅者以及前 2 个可用地址

SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id_1,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 1 LIMIT 2) AS a_id_2
FROM subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200

选择 F:15 毫秒(不错):选择最后 200 个订阅者以及前 2 个可用地址,没有偏移量,而是 WHERE s.user_id > 100385

SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id_1,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 1 LIMIT 2) AS a_id_2
FROM subscribers s
WHERE s.user_id > 100385 --same as OFFSET 100000 in my data
ORDER BY s.user_id
LIMIT 200

E 的执行计划:

Visual Plan

'Limit  (cost=1677635.30..1677635.80 rows=200 width=4) (actual time=2251.503..2251.816 rows=200 loops=1)'
' Output: s.user_id, ((SubPlan 1)), ((SubPlan 2))'
' Buffers: shared hit=607074'
' -> Sort (cost=1677385.30..1677636.08 rows=100312 width=4) (actual time=2146.867..2200.704 rows=100200 loops=1)'
' Output: s.user_id, ((SubPlan 1)), ((SubPlan 2))'
' Sort Key: s.user_id'
' Sort Method: quicksort Memory: 7775kB'
' Buffers: shared hit=607074'
' -> Seq Scan on public.pcv_subscriber s (cost=0.00..1669052.31 rows=100312 width=4) (actual time=0.040..2046.926 rows=100312 loops=1)'
' Output: s.user_id, (SubPlan 1), (SubPlan 2)'
' Buffers: shared hit=607074'
' SubPlan 1'
' -> Limit (cost=8.29..8.29 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=100312)'
' Output: ua.user_address_id'
' Buffers: shared hit=301458'
' -> Sort (cost=8.29..8.29 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=100312)'
' Output: ua.user_address_id'
' Sort Key: ua.user_address_id'
' Sort Method: quicksort Memory: 25kB'
' Buffers: shared hit=301458'
' -> Index Scan using ix_pcv_user_address_user_id on public.pcv_user_address ua (cost=0.00..8.28 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=100312)'
' Output: ua.user_address_id'
' Index Cond: (ua.user_id = $0)'
' Buffers: shared hit=301458'
' SubPlan 2'
' -> Limit (cost=8.29..8.29 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=100312)'
' Output: ua.user_address_id'
' Buffers: shared hit=301458'
' -> Sort (cost=8.29..8.29 rows=1 width=4) (actual time=0.006..0.007 rows=1 loops=100312)'
' Output: ua.user_address_id'
' Sort Key: ua.user_address_id'
' Sort Method: quicksort Memory: 25kB'
' Buffers: shared hit=301458'
' -> Index Scan using ix_pcv_user_address_user_id on public.pcv_user_address ua (cost=0.00..8.28 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=100312)'
' Output: ua.user_address_id'
' Index Cond: (ua.user_id = $0)'
' Buffers: shared hit=301458'
'Total runtime: 2251.968 ms'

免责声明:这是一个更大、更复杂的语句的精简示例,它使 GUI 表能够对具有跨多个表的大量额外累积数据的订阅者进行排序/分页/过滤。所以我知道这个例子可以用更好的方式来完成。因此,请帮助我理解为什么此解决方案如此缓慢或最多建议进行最小的更改。

更新 1:

这是使用 Postgres 9.0.3 制作的

更新 2:

目前我能想到的最好的解决方案似乎是这个愚蠢的声明:

选择 G:73ms(OKish)

SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id_1,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 1 LIMIT 2) AS a_id_2
FROM subscribers s
WHERE s.user_id >= (SELECT user_id from subscribers ORDER BY user_id OFFSET 100000 LIMIT 1)
ORDER BY s.user_id
LIMIT 200

更新 3:

David 迄今为止的最佳选择。 (与 G 性能相同但更直观)

选择 H:73ms(OKish)

SELECT s2.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM (SELECT s.user_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200) s2

H 的执行计划:

这就是我最初对 E 的想象。 enter image description here

最佳答案

我认为即使对于您未包含在最终数据集中的 100000 行,也会执行 SELECT 子句中表达的连接。

这个怎么样:

SELECT s2.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM (select *
from subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200) s2

做不到这一点,尝试一个通用的表表达式:

With s2 as (
select *
from subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200)
SELECT s2.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM s2

关于performance - Postgres : Why is the performance so bad on subselects with Offset/Limit,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29307991/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com