gpt4 book ai didi

sql - 在 PostgreSQL 中跨多个表索引列

转载 作者:行者123 更新时间:2023-11-29 12:51:54 25 4
gpt4 key购买 nike

我正在尝试优化以下连接查询:

通知是一条记录,说明用户是否阅读了某些事件。一个通知指向一项事件,但许多用户可以收到有关一项事件的通知。事件记录有一些列,例如事件所在的工作空间和事件类型。

此查询获取已在特定工作区中按时间排序的用户非评论通知。

explain analyze
select activity.id from activity, notification
where notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.read = true

and notification.activity_id = activity.id

and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
and activity.type != 'commented'
order by activity.end_time desc
limit 20;

问题是这个查询必须遍历用户收到的每个通知。

Limit  (cost=4912.35..4912.36 rows=1 width=24) (actual time=138.767..138.779 rows=20 loops=1)
-> Sort (cost=4912.35..4912.36 rows=1 width=24) (actual time=138.766..138.770 rows=20 loops=1)
Sort Key: activity.end_time DESC
Sort Method: top-N heapsort Memory: 27kB
-> Nested Loop (cost=32.57..4912.34 rows=1 width=24) (actual time=1.354..138.606 rows=447 loops=1)
-> Bitmap Heap Scan on notification (cost=32.01..3847.48 rows=124 width=16) (actual time=1.341..6.639 rows=1218 loops=1)
Recheck Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
Filter: read
Rows Removed by Filter: 4101
Heap Blocks: exact=4774
-> Bitmap Index Scan on notification_user_id_idx (cost=0.00..31.98 rows=988 width=0) (actual time=0.719..0.719 rows=5355 loops=1)
Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
-> Index Scan using activity_pkey on activity (cost=0.56..8.59 rows=1 width=24) (actual time=0.108..0.108 rows=0 loops=1218)
Index Cond: (id = notification.activity_id)
Filter: ((type <> 'commented'::activity_type) AND (space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid))
Rows Removed by Filter: 1
Planning time: 0.428 ms
Execution time: 138.825 ms

编辑:这是缓存预热后的性能。

Limit  (cost=4912.35..4912.36 rows=1 width=24) (actual time=13.618..13.629 rows=20 loops=1)
-> Sort (cost=4912.35..4912.36 rows=1 width=24) (actual time=13.617..13.621 rows=20 loops=1)
Sort Key: activity.end_time DESC
Sort Method: top-N heapsort Memory: 27kB
-> Nested Loop (cost=32.57..4912.34 rows=1 width=24) (actual time=1.365..13.447 rows=447 loops=1)
-> Bitmap Heap Scan on notification (cost=32.01..3847.48 rows=124 width=16) (actual time=1.352..6.606 rows=1218 loops=1)
Recheck Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
Filter: read
Rows Removed by Filter: 4101
Heap Blocks: exact=4774
-> Bitmap Index Scan on notification_user_id_idx (cost=0.00..31.98 rows=988 width=0) (actual time=0.729..0.729 rows=5355 loops=1)
Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
-> Index Scan using activity_pkey on activity (cost=0.56..8.59 rows=1 width=24) (actual time=0.005..0.005 rows=0 loops=1218)
Index Cond: (id = notification.activity_id)
Filter: ((type <> 'commented'::activity_type) AND (space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid))
Rows Removed by Filter: 1
Planning time: 0.438 ms
Execution time: 13.673 ms

我可以在 user_id 上创建一个多列索引并读取,但这并不能解决我要解决的问题。

我可以自己解决这个问题,方法是手动对数据进行反规范化,在通知记录中添加 space_id、type 和 end_time 列,但这似乎是不必要的。

我希望 Postgres 能够跨两个表创建索引,但到目前为止我读到的所有内容都表明这是不可能的。

那么我的问题是:优化此查询的最佳方法是什么?


编辑:创建建议索引后:

create index tmp_index_1 on activity using btree (
space_id,
id,
end_time
) where (
type != 'commented'
);

create index tmp_index_2 on notification using btree (
user_id,
activity_id
) where (
read = true
);

查询性能提高了 3 倍。

explain analyse
select activity.id from activity
INNER JOIN notification ON notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.read = true
and notification.activity_id = activity.id
and activity.space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'
and activity.type != 'commented'
order by activity.end_time desc
limit 20;

Limit (cost=955.26..955.27 rows=1 width=24) (actual time=4.386..4.397 rows=20 loops=1)
-> Sort (cost=955.26..955.27 rows=1 width=24) (actual time=4.385..4.389 rows=20 loops=1)
Sort Key: activity.end_time DESC
Sort Method: top-N heapsort Memory: 27kB
-> Nested Loop (cost=1.12..955.25 rows=1 width=24) (actual time=0.035..4.244 rows=447 loops=1)
-> Index Only Scan using tmp_index_2 on notification (cost=0.56..326.71 rows=124 width=16) (actual time=0.017..1.039 rows=1218 loops=1)
Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
Heap Fetches: 689
-> Index Only Scan using tmp_index_1 on activity (cost=0.56..5.07 rows=1 width=24) (actual time=0.002..0.002 rows=0 loops=1218)
Index Cond: ((space_id = '6d702c09-8795-4185-abb3-dc6b3e8907dc'::uuid) AND (id = notification.activity_id))
Heap Fetches: 1
Planning time: 0.484 ms
Execution time: 4.428 ms

关于此查询仍然困扰我的一件事是 rows=1218loops=1218。此查询循环遍历所有已读用户通知并查询事件表。

我希望能够创建一个索引来以模拟非规范化此数据的方式读取所有这些内容。例如,如果我将 space_id、type 和 end_time 添加到通知表,我可以创建以下索引并以毫秒为单位读取。

create index tmp_index_3 on notification using btree (
user_id,
space_id,
end_time desc
) where (
read = true
and type != 'commented'
);

如果不进行反规范化,这在 Postgres 中目前是不可能的吗?

最佳答案

添加索引:

create index ix1_activity on activity (space_id, type, end_time, id);

create index ix2_notification on notification (activity_id, user_id, read);

这两个“覆盖索引”可以使您的查询真正快速。

此外,运气好的话,它会先读取activity表(只有20行),然后在notification上执行Nested Loop Join (NLJ) .也就是说,索引遍历非常有限。

关于sql - 在 PostgreSQL 中跨多个表索引列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51973178/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com