gpt4 book ai didi

与 View 连接的 PostgreSQL 查询性能不佳

转载 作者:行者123 更新时间:2023-11-29 13:46:38 25 4
gpt4 key购买 nike

我正在尝试创建一个 API 来返回用户拥有的文件/文件夹列表,类似于 Google 云端硬盘。有关用户文件/文件夹/其他的信息存储在 PostgreSQL 表中。

我创建了一个名为 gc_drive 的 View ,它从这些表中获取文件列表。是的,我称它们为表,因为文件、文件夹和虚拟文件有不同的表,所以我使用 View 将所有这些记录连接在一起,以便它们有效地显示为一个文件系统,即使它们不是 100% 的文件系统。

View 性能看起来不错,我已经为必要的列编制了索引,即使 View 有超过 270 万条记录和所有数据组合,它也会按 ID 过滤并在几毫秒内返回父项过滤的记录。但是,如果我尝试从 gc_drive 中选择 count(*),则需要 69 秒,因为包含文件、文件夹、资源和其他列表的基表每个都有超过 700 万条记录。

我想将 gc_drive View 与 fn_resource_permissions 表结合起来,该表包含有关用户是否有权访问资源的信息。

当我使用以下查询获取当前登录用户的权限时,记录几乎立即返回:

select*from 
fn_resource_permissions
WHERE ((fn_resource_permissions.permission_id IN (23,
24,
25,
37,
36)
AND (fn_resource_permissions.user_id = 2
OR fn_resource_permissions.user_id = 1)))

--59 records returned

上面的查询返回我对用户拥有的资源的所有权限。我将使用这些记录来过滤 gc_drive View 。

当我尝试在以下查询中使用这 59 条记录来过滤掉 gc_drive View 时,记录也会立即返回:

SELECT gc_drive.id,
"fn_resources"."owner_id" AS "fn_resources.owner_id",
"gc_drive".*
FROM "gc_drive"
JOIN "fn_resources" ON ("gc_drive"."resource_id" = "fn_resources"."id")
WHERE "fn_resources"."archived" = 0
AND gc_drive.id IN(
1234,
1235,
1236,
...
);

问题是当我尝试将上面的 2 个查询连接在一起时:

SELECT gc_drive.id,
"fn_resources"."owner_id" AS "fn_resources.owner_id",
"gc_drive".*
FROM "gc_drive"
JOIN "fn_resources" ON ("gc_drive"."resource_id" = "fn_resources"."id")
JOIN "fn_resource_permissions" ON ("fn_resource_permissions"."resource_id" = fn_resources.id)
WHERE "fn_resources"."archived" = 0
AND ((fn_resource_permissions.permission_id IN (23,
24,
25,
37,
36)
AND (fn_resource_permissions.user_id = 2
OR fn_resource_permissions.user_id = 1)))
LIMIT 101;

当我查看 Explain 计划时,我可以看到 PostgreSQL 正在具体化整个 gc_drive View ,然后才尝试按资源权限进行过滤。这使得查询在几分钟内运行,而不是几毫秒。我还尝试将资源权限放在 with 子句中,它也是如此。我知道其中一个解决方案可能是将每个用户的内容分隔在自己的模式中,但我想知道是否有更好更有效的方法来加入 gc_drive View 和其他表,而无需 PostgreSQL 首先具体化整个 View 。有没有办法修改查询,使 PostgreSQL 可以只对整个记录集进行过滤。

解释上述查询的分析计划:

    ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit (cost=4845.56..57547586.51 rows=101 width=722) (actual time=201371.236..201371.236 rows=0 loops=1) │
│ -> Hash Join (cost=4845.56..681971785.33 rows=1197 width=722) (actual time=201371.234..201371.234 rows=0 loops=1) │
│ Hash Cond: ("*SELECT* 1".resource_id = fn_resources.id) │
│ -> Append (cost=36.10..681748786.17 rows=15867640 width=247) (actual time=17521.979..200235.975 rows=2737742 loops=1) │
│ -> Subquery Scan on "*SELECT* 1" (cost=36.10..533636.75 rows=11252 width=150) (actual time=0.004..0.004 rows=0 loops=1) │
│ -> Hash Join (cost=36.10..533524.23 rows=11252 width=150) (actual time=0.003..0.003 rows=0 loops=1) │
│ Hash Cond: (fnr_child.type = fnrt.type) │
│ -> Nested Loop (cost=0.00..66613.04 rows=1940 width=122) (actual time=0.003..0.003 rows=0 loops=1) │
│ -> Nested Loop (cost=0.00..49607.76 rows=1940 width=100) (actual time=0.002..0.002 rows=0 loops=1) │
│ -> Nested Loop (cost=0.00..32602.48 rows=1940 width=104) (actual time=0.002..0.002 rows=0 loops=1) │
│ -> Nested Loop (cost=0.00..16315.94 rows=1940 width=12) (actual time=0.002..0.002 rows=0 loops=1) │
│ -> Seq Scan on gc_virtual_file gvf (cost=0.00..29.40 rows=1940 width=8) (actual time=0.001..0.001 rows=0 loops=1) │
│ -> Index Scan using gc_file_resource_id_key on gc_file gf_child (cost=0.00..8.38 rows=1 width=4) (never executed) │
│ Index Cond: (resource_id = gvf.resource_id) │
│ -> Index Scan using gc_file_resource_id_key on gc_file gf_parent (cost=0.00..8.38 rows=1 width=92) (never executed) │
│ Index Cond: (resource_id = gvf.parent_id) │
│ -> Index Scan using fn_resources_pkey on fn_resources fnr_parent (cost=0.00..8.75 rows=1 width=4) (never executed) │
│ Index Cond: (id = gvf.parent_id) │
│ -> Index Scan using fn_resources_pkey on fn_resources fnr_child (cost=0.00..8.75 rows=1 width=30) (never executed) │
│ Index Cond: (id = gvf.resource_id) │
│ -> Hash (cost=21.60..21.60 rows=1160 width=36) (never executed) │
│ -> Seq Scan on fn_resource_types fnrt (cost=0.00..21.60 rows=1160 width=36) (never executed) │
│ SubPlan 6 │
│ -> Index Scan using users_pkey on fn_users (cost=0.00..8.28 rows=1 width=13) (never executed) │
│ Index Cond: (id = fnr_child.owner_id) │
│ SubPlan 7 │
│ -> Index Scan using gc_tables_resource_id_key on gc_tables gt (cost=0.00..8.28 rows=1 width=18) (never executed) │
│ Index Cond: (resource_id = fnr_child.id) │
│ SubPlan 8 │
│ -> Index Scan using gc_webmapservices_resource_id_key on gc_webmapservices gwms (cost=0.00..8.27 rows=1 width=418) (never executed) │
│ Index Cond: (resource_id = fnr_child.id) │
│ SubPlan 9 │
│ -> Index Scan using gc_tables_resource_id_key on gc_tables gt (cost=0.00..8.31 rows=1 width=14) (never executed) │
│ Index Cond: (resource_id = fnr_child.id) │
│ SubPlan 10 │
│ -> Index Scan using gc_webmapservices_resource_id_key on gc_webmapservices gwms (cost=0.00..8.30 rows=1 width=267) (never executed) │
│ Index Cond: (resource_id = fnr_child.id) │
│ -> Subquery Scan on "*SELECT* 2" (cost=580092.62..681215149.42 rows=15856388 width=247) (actual time=17521.724..199482.039 rows=2737742 loops=1) │
│ -> Hash Join (cost=580092.62..681056585.54 rows=15856388 width=247) (actual time=17521.668..198482.521 rows=2737742 loops=1) │
│ Hash Cond: (fnr_child.type = fnrt.type) │
│ -> Hash Join (cost=580056.53..1189954.39 rows=2733860 width=219) (actual time=16768.431..88474.618 rows=2737742 loops=1) │
│ Hash Cond: (gf_parent.resource_id = fnr_parent.id) │
│ -> Hash Join (cost=388738.70..943959.36 rows=2733860 width=219) (actual time=13619.604..72625.523 rows=2737742 loops=1) │
│ Hash Cond: (gf_child.path = gf_parent.pathname) │
│ -> Hash Join (cost=202378.85..538142.64 rows=2733860 width=168) (actual time=6429.263..53255.517 rows=2737743 loops=1) │
│ Hash Cond: (fnr_child.id = gf_child.resource_id) │
│ -> Seq Scan on fn_resources fnr_child (cost=0.00..114415.70 rows=6152170 width=30) (actual time=21.243..19364.289 rows=6119914 loops=1) │
│ -> Hash (cost=112139.60..112139.60 rows=2733860 width=142) (actual time=5666.626..5666.626 rows=2737743 loops=1) │
│ Buckets: 262144 Batches: 2 Memory Usage: 240441kB │
│ -> Seq Scan on gc_file gf_child (cost=0.00..112139.60 rows=2733860 width=142) (actual time=0.015..3442.788 rows=2737751 loops=1) │
│ -> Hash (cost=112139.60..112139.60 rows=2733860 width=92) (actual time=7173.094..7173.094 rows=2737751 loops=1) │
│ Buckets: 262144 Batches: 2 Memory Usage: 169423kB │
│ -> Seq Scan on gc_file gf_parent (cost=0.00..112139.60 rows=2733860 width=92) (actual time=0.007..5057.412 rows=2737751 loops=1) │
│ -> Hash (cost=114415.70..114415.70 rows=6152170 width=4) (actual time=3145.424..3145.424 rows=6119914 loops=1) │
│ Buckets: 1048576 Batches: 1 Memory Usage: 215154kB │
│ -> Seq Scan on fn_resources fnr_parent (cost=0.00..114415.70 rows=6152170 width=4) (actual time=0.008..1186.819 rows=6119914 loops=1) │
│ -> Hash (cost=21.60..21.60 rows=1160 width=36) (actual time=0.021..0.021 rows=21 loops=1) │
│ Buckets: 1024 Batches: 1 Memory Usage: 1kB │
│ -> Seq Scan on fn_resource_types fnrt (cost=0.00..21.60 rows=1160 width=36) (actual time=0.005..0.012 rows=21 loops=1) │
│ SubPlan 1 │
│ -> Index Scan using users_pkey on fn_users (cost=0.00..8.28 rows=1 width=13) (actual time=0.009..0.009 rows=1 loops=2737742) │
│ Index Cond: (id = fnr_child.owner_id) │
│ SubPlan 2 │
│ -> Index Scan using gc_file_resource_id_key on gc_file gf (cost=0.00..8.62 rows=1 width=47) (never executed) │
│ Index Cond: (resource_id = fnr_child.id) │
│ SubPlan 3 │
│ -> Index Scan using gc_file_resource_id_key on gc_file gf (cost=0.00..8.62 rows=1 width=47) (actual time=0.025..0.025 rows=1 loops=2665337) │
│ Index Cond: (resource_id = fnr_child.id) │
│ SubPlan 4 │
│ -> Index Scan using gc_file_resource_id_key on gc_file gf (cost=0.00..8.63 rows=1 width=129) (never executed) │
│ Index Cond: (resource_id = fnr_child.id) │
│ SubPlan 5 │
│ -> Index Scan using gc_file_resource_id_key on gc_file gf (cost=0.00..8.66 rows=1 width=145) (actual time=0.003..0.003 rows=1 loops=2665337) │
│ Index Cond: (resource_id = fnr_child.id) │
│ -> Hash (cost=4804.79..4804.79 rows=374 width=12) (actual time=412.434..412.434 rows=59 loops=1) │
│ Buckets: 1024 Batches: 1 Memory Usage: 3kB │
│ -> Nested Loop (cost=13.32..4804.79 rows=374 width=12) (actual time=156.510..412.313 rows=59 loops=1) │
│ -> Bitmap Heap Scan on fn_resource_permissions (cost=13.32..1454.38 rows=374 width=4) (actual time=155.407..402.734 rows=59 loops=1) │
│ Recheck Cond: ((user_id = 2) OR (user_id = 1)) │
│ Filter: (permission_id = ANY ('{23,24,25,37,36}'::integer[])) │
│ -> BitmapOr (cost=13.32..13.32 rows=391 width=0) (actual time=88.402..88.402 rows=0 loops=1) │
│ -> Bitmap Index Scan on resource_permissions_user (cost=0.00..6.57 rows=196 width=0) (actual time=88.380..88.380 rows=61 loops=1) │
│ Index Cond: (user_id = 2) │
│ -> Bitmap Index Scan on resource_permissions_user (cost=0.00..6.57 rows=196 width=0) (actual time=0.016..0.016 rows=0 loops=1) │
│ Index Cond: (user_id = 1) │
│ -> Index Scan using fn_resources_pkey on fn_resources (cost=0.00..8.95 rows=1 width=8) (actual time=0.154..0.155 rows=1 loops=59) │
│ Index Cond: (id = fn_resource_permissions.resource_id) │
│ Filter: (archived = 0) │
│ Total runtime: 201373.236 ms │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

非常感谢。

最佳答案

您的查询的结构使得 fn_resourcesfn_resource_permissions 过滤,然后加入到 gc_drive 中,尽管 中存在相同的数据>gc_drive.

这有点预感,但请尝试使用 where 子句尽早通过 fn_resource_permissions 过滤 gc_drive,而不是加入一个连接。

select
gc_drive.id,
fn_resources.owner_id,
gc_drive.*
from
gc_drive
join fn_resources on(
gc_drive.resource_id = fn_resources.id
)
where
gc_drive.resource_id in(
select
fn_resource_permissions.resource_id
from
fn_resource_permissions
where
fn_resource_permissions.permission_id in (23, 24, 25, 37, 36)
and fn_resource_permissions.user_id in(1, 2)
)
and fn_resources.archived = 0
limit 101;

或者,尝试在加入之前过滤fn_resources

select
gc_drive.id,
fn_resources.owner_id,
gc_drive.*
from
gc_drive
join fn_resources on(
gc_drive.resource_id = fn_resources.id
and fn_resources.id in(
select
fn_resource_permissions.resource_id
from
fn_resource_permissions
where
fn_resource_permissions.permission_id in (23, 24, 25, 37, 36)
and fn_resource_permissions.user_id in(1, 2)
)
)
where
fn_resources.archived = 0
limit 101;

关于与 View 连接的 PostgreSQL 查询性能不佳,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47259983/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com