作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我的 tracks
表包含大约 300 万条记录(每天增加 500 条),大约有 30 列,但我在 WHERE
子句中只使用了 15 列。查询平均需要 4800 毫秒,没有其他用户/进程使用数据库。如何让它更快?我希望看到接近 100 毫秒的结果。
寻找歌曲(轨道)的人填写表格:
99% 的用例是 SELECT 查询:
SELECT
"public"."tracks"."sys_id",
"public"."tracks"."all_artists",
"public"."tracks"."name",
"public"."tracks"."genres",
"public"."tracks"."release_date",
"public"."tracks"."tempo",
"public"."tracks"."popularity",
"public"."tracks"."danceability",
"public"."tracks"."energy",
"public"."tracks"."speechiness",
"public"."tracks"."acousticness",
"public"."tracks"."instrumentalness",
"public"."tracks"."liveness",
"public"."tracks"."valence",
"public"."tracks"."main_artist_popularity",
"public"."tracks"."main_artist_followers",
"public"."tracks"."key",
"public"."tracks"."preview_url"
FROM
"public"."tracks"
WHERE
(
"public"."tracks"."name" LIKE '%oultec%'
OR "public"."tracks"."all_artists_string" LIKE '%oultec%'
)
AND ("public"."tracks"."genres_string" LIKE '%rum%')
AND "public"."tracks"."tempo" >= '80'
AND "public"."tracks"."tempo" <= '210'
AND "public"."tracks"."popularity" >= '0'
AND "public"."tracks"."popularity" <= '100'
AND "public"."tracks"."main_artist_popularity" >= '1'
AND "public"."tracks"."main_artist_popularity" <= '100'
AND "public"."tracks"."main_artist_followers" >= '1'
AND "public"."tracks"."main_artist_followers" <= '50000000'
AND "public"."tracks"."danceability" >= '0'
AND "public"."tracks"."danceability" <= '1000'
AND "public"."tracks"."energy" >= '0'
AND "public"."tracks"."energy" <= '1000'
AND "public"."tracks"."speechiness" >= '0'
AND "public"."tracks"."speechiness" <= '1000'
AND "public"."tracks"."acousticness" >= '0'
AND "public"."tracks"."acousticness" <= '1000'
AND "public"."tracks"."instrumentalness" >= '0'
AND "public"."tracks"."instrumentalness" <= '1000'
AND "public"."tracks"."liveness" >= '0'
AND "public"."tracks"."liveness" <= '1000'
AND "public"."tracks"."valence" >= '0'
AND "public"."tracks"."valence" <= '1000'
AND "public"."tracks"."release_date" >= '2020-01-01'
AND "public"."tracks"."key" = '10'
ORDER BY
"public"."tracks"."release_date" DESC,
"public"."tracks"."popularity" DESC,
"public"."tracks"."sys_id" ASC
LIMIT 5 OFFSET 0;
索引(索引):
PRIMARY sys_id
UNIQUE main_artist, name, duration_ms
INDEX energy
INDEX tempo, popularity, main_artist_popularity, main_artist_followers, danceability, energy, speechiness, acousticness, instrumentalness, liveness, valence, name, all_artists_string, genres_string, release_date, key
解释
/分析
:
Limit (cost=308411.32..308411.33 rows=1 width=279) (actual time=4582.756..4584.010 rows=0 loops=1)
-> Sort (cost=308411.32..308411.33 rows=1 width=279) (actual time=4546.306..4547.559 rows=0 loops=1)
Sort Key: release_date DESC, popularity DESC, sys_id
Sort Method: quicksort Memory: 25kB
-> Gather (cost=1000.00..308411.31 rows=1 width=279) (actual time=4546.121..4547.374 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tracks (cost=0.00..307411.21 rows=1 width=279) (actual time=4473.923..4473.926 rows=0 loops=3)
Filter: ((genres_string ~~ '%rum%'::text) AND (tempo >= '80'::smallint) AND (tempo <= '210'::smallint) AND (popularity >= '0'::smallint) AND (popularity <= '100'::smallint) AND (main_artist_popularity >= '1'::smallint) AND (main_artist_popularity <= '100'::smallint) AND (main_artist_followers >= 1) AND (main_artist_followers <= 50000000) AND (danceability >= '0'::smallint) AND (danceability <= '1000'::smallint) AND (energy >= '0'::smallint) AND (energy <= '1000'::smallint) AND (speechiness >= '0'::smallint) AND (speechiness <= '1000'::smallint) AND (acousticness >= '0'::smallint) AND (acousticness <= '1000'::smallint) AND (instrumentalness >= '0'::smallint) AND (instrumentalness <= '1000'::smallint) AND (liveness >= '0'::smallint) AND (liveness <= '1000'::smallint) AND (valence >= '0'::smallint) AND (valence <= '1000'::smallint) AND (release_date >= '2020-01-01'::date) AND (key = '10'::smallint) AND ((name ~~ '%oultec%'::text) OR (all_artists_string ~~ '%oultec%'::text)))
Rows Removed by Filter: 1034268
Planning Time: 5.145 ms
JIT:
Functions: 14
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 18.941 ms, Inlining 0.000 ms, Optimization 8.644 ms, Emission 110.864 ms, Total 138.449 ms
Execution Time: 4667.446 ms
从“官方”镜像运行 PostgreSQL:postgres:14.1-alpine
:
表结构:
运行查询的网站(通过 API/后端,更多带有最小/最大整数的字段,但此处未显示):
最佳答案
您的查询对日期和数字进行LIKE '%something%'
全文搜索和范围扫描。但 BTREE 索引(默认情况下)只能处理一次范围扫描。而且,他们根本无法处理LIKE '%something%'
。因此,您将对每个查询进行全表扫描。考虑到三个巨型行的 4.8 秒还算不错。
对于列 LIKE '%something%'
搜索,您可以尝试使用三元组索引,这是 postgreSQL 的一项功能。此代码将在 name
上创建三元组索引。这可能会缩小选择范围,从而使您必须扫描更少的数据。
CREATE EXTENSION pg_trgm; -- you may or may not need this statement.
CREATE INDEX CONCURRENTLY tracks_name
ON tracks
USING GIN (name gin_trgm_ops);
CREATE INDEX CONCURRENTLY tracks_all_artists_string
ON tracks
USING GIN (all_artists_string gin_trgm_ops);
CREATE INDEX CONCURRENTLY tracks_genres_string
ON tracks
USING GIN (genres_string gin_trgm_ops);
但是您仍然需要扫描所有匹配的轨道。
如果您创建这些索引,然后重构 WHERE 子句的前几位以使用像这样的集合计算,您可能(或可能不会)获得更好的性能。
WHERE sys_id IN (
(SELECT sys_id FROM tracks WHERE name LIKE '%oultec%'
UNION
SELECT sys_id FROM tracks WHERE all_artists_string LIKE '%oultec%'
)
INTERSECT
SELECT sys_id FROM tracks WHERE genres_string LIKE '%oultec%'
)
AND tempo >= '80' ...
但事实是 SQL 并不适合所有这些范围扫描。
关于sql - 如何让查询更快?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74047973/
我是一名优秀的程序员,十分优秀!