gpt4 book ai didi

sql - 通过多个连接、分组依据和排序依据加快查询速度

转载 作者:行者123 更新时间:2023-11-29 13:09:54 25 4
gpt4 key购买 nike

我有一个 SQL 查询:

SELECT
title,
(COUNT(DISTINCT A.id)) AS "count_title"

FROM
B
INNER JOIN D ON B.app = D.app
INNER JOIN A ON D.number = A.number
INNER JOIN C ON A.id = C.id

GROUP BY C.title
ORDER BY count_title DESC
LIMIT 10
;

D 表包含 50M 条记录,A 表包含 30M 条记录,B 和 C 各包含 30k 条记录。索引在连接、分组依据、排序依据中使用的所有列上定义。

查询在没有 order by 语句的情况下工作正常,并在大约 2-3 秒内返回结果。

但是,随着排序操作(order by),查询时间增加到 10-12 秒。

我理解这背后的原因,执行者必须遍历所有记录进行排序操作,索引在这里几乎没有帮助。

还有其他一些方法可以加快这个查询吗?

下面是这个查询的解释分析:

"QUERY PLAN"
"Limit (cost=974652.20..974652.22 rows=10 width=54) (actual time=2817.579..2825.071 rows=10 loops=1)"
" Buffers: shared hit=120299 read=573195"
" -> Sort (cost=974652.20..974666.79 rows=5839 width=54) (actual time=2817.578..2817.578 rows=10 loops=1)"
" Sort Key: (count(DISTINCT A.id)) DESC"
" Sort Method: top-N heapsort Memory: 26kB"
" Buffers: shared hit=120299 read=573195"
" -> GroupAggregate (cost=974325.65..974526.02 rows=5839 width=54) (actual time=2792.465..2817.097 rows=3618 loops=1)"
" Group Key: C.title"
" Buffers: shared hit=120299 read=573195"
" -> Sort (cost=974325.65..974372.97 rows=18931 width=32) (actual time=2792.451..2795.161 rows=45175 loops=1)"
" Sort Key: C.title"
" Sort Method: quicksort Memory: 5055kB"
" Buffers: shared hit=120299 read=573195"
" -> Gather (cost=968845.30..972980.74 rows=18931 width=32) (actual time=2753.402..2778.648 rows=45175 loops=1)"
" Workers Planned: 1"
" Workers Launched: 1"
" Buffers: shared hit=120299 read=573195"
" -> Parallel Hash Join (cost=967845.30..970087.64 rows=11136 width=32) (actual time=2751.725..2764.832 rows=22588 loops=2)"
" Hash Cond: ((C.id)::text = (A.id)::text)"
" Buffers: shared hit=120299 read=573195"
" -> Parallel Seq Scan on C (cost=0.00..1945.87 rows=66687 width=32) (actual time=0.017..4.316 rows=56684 loops=2)"
" Buffers: shared read=1279"
" -> Parallel Hash (cost=966604.55..966604.55 rows=99260 width=9) (actual time=2750.987..2750.987 rows=20950 loops=2)"
" Buckets: 262144 Batches: 1 Memory Usage: 4032kB"
" Buffers: shared hit=120266 read=571904"
" -> Nested Loop (cost=219572.23..966604.55 rows=99260 width=9) (actual time=665.832..2744.270 rows=20950 loops=2)"
" Buffers: shared hit=120266 read=571904"
" -> Parallel Hash Join (cost=219571.79..917516.91 rows=99260 width=4) (actual time=665.804..2583.675 rows=20950 loops=2)"
" Hash Cond: ((D.app)::text = (B.app)::text)"
" Buffers: shared hit=8 read=524214"
" -> Parallel Bitmap Heap Scan on D (cost=217542.51..895848.77 rows=5126741 width=13) (actual time=661.254..1861.862 rows=6160441 loops=2)"
" Recheck Cond: ((action_type)::text = ANY ('{10,11}'::text[]))"
" Heap Blocks: exact=242152"
" Buffers: shared hit=3 read=523925"
" -> Bitmap Index Scan on D_index_action_type (cost=0.00..214466.46 rows=12304178 width=0) (actual time=546.470..546.471 rows=12320882 loops=1)"
" Index Cond: ((action_type)::text = ANY ('{10,11}'::text[]))"
" Buffers: shared hit=3 read=33669"
" -> Parallel Hash (cost=1859.36..1859.36 rows=13594 width=12) (actual time=4.337..4.337 rows=16313 loops=2)"
" Buckets: 32768 Batches: 1 Memory Usage: 1152kB"
" Buffers: shared hit=5 read=289"
" -> Parallel Index Only Scan using B_index_app on B (cost=0.29..1859.36 rows=13594 width=12) (actual time=0.015..2.218 rows=16313 loops=2)"
" Heap Fetches: 0"
" Buffers: shared hit=5 read=289"
" -> Index Scan using A_index_number on A (cost=0.43..0.48 rows=1 width=24) (actual time=0.007..0.007 rows=1 loops=41900)"
" Index Cond: ((number)::text = (D.number)::text)"
" Buffers: shared hit=120258 read=47690"
"Planning Time: 0.747 ms"
"Execution Time: 2825.118 ms"

最佳答案

您可以尝试在 bd 之间进行嵌套循环连接,因为 b 小得多:

CREATE INDEX ON d (app);

如果 d 被足够频繁地清理,您可以看到仅索引扫描是否更快。为此,在索引中包含 number(在 v11 中,为此使用 INCLUDE 子句!)。 EXPLAIN 输出表明您在 action_type 上有一个额外的条件;对于仅索引扫描,您还必须包含该列。

关于sql - 通过多个连接、分组依据和排序依据加快查询速度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56334762/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com