postgresql - 未使用 Postgres `gin_trgm

postgresql - 未使用 Postgres `gin_trgm_ops` 索引

转载作者：行者123 更新时间：2023-11-29 14:13:47

26

4

我正在尝试 speed up Postgres 中的一些文本匹配，使用 pg_trgm 扩展:

CREATE TABLE test3 (id bigint, key text, value text);

insert into test3 values (1, 'first 1', 'second 3');
insert into test3 values (2, 'first 1', 'second 2');
insert into test3 values (2, 'first 2', 'second 3');
insert into test3 values (3, 'first 1', 'second 2');
insert into test3 values (3, 'first 1', 'second 3');
insert into test3 values (4, 'first 2', 'second 3');
insert into test3 values (4, 'first 2', 'second 3');
insert into test3 values (4, 'first 1', 'second 2');
insert into test3 values (4, 'first 1', 'second 2');

-- repeat the above 1,000,000x times, to have more rows for benchmarking
insert into test3(id, key, value) select id, key, value from test3 cross join generate_series(1, 1000000);

现在我用 ILIKE 查询这个表:

select count(*) from test3 where key = 'first 1' and value ilike '%nd 3%';
Time: 918.265 ms

为了查看索引是否会加快速度，我在 key 和 value 列上添加了 pg_trgm:

CREATE extension if not exists pg_trgm;
CREATE INDEX test3_key_trgm_idx ON test3 USING gin (key gin_trgm_ops);
CREATE INDEX test3_value_trgm_idx ON test3 USING gin (value gin_trgm_ops);

但是查询仍然需要相同的时间，EXPLAIN ANALYZE 显示索引根本没有被使用:

explain analyze select count(*) from test3 where key = 'first 1' and value ilike '%nd 3%';
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=126905.14..126905.15 rows=1 width=8) (actual time=1017.666..1017.667 rows=1 loops=1)
   ->  Gather  (cost=126904.93..126905.14 rows=2 width=8) (actual time=1017.505..1018.778 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=125904.93..125904.94 rows=1 width=8) (actual time=1010.862..1010.862 rows=1 loops=3)
               ->  Parallel Seq Scan on test3  (cost=0.00..122427.06 rows=1391148 width=0) (actual time=0.041..973.550 rows=666667 loops=3)
                     Filter: ((value ~~* '%nd 3%'::text) AND (key = 'first 1'::text))
                     Rows Removed by Filter: 2333336
 Planning Time: 0.266 ms
 Execution Time: 1018.814 ms

Time: 1049.413 ms (00:01.049)

注意顺序扫描。给了什么？

最佳答案

没关系，我发现了问题。

查询规划器比我的玩具测试集更聪明；看到大多数行与查询匹配，它进行了顺序扫描。

如果我尝试使用 ilike '%nd 0%' 代替，没有行匹配并且 EXPLAIN ANALYZE 报告 Bitmap Index Scan on test3_value_trgm_idx 正确。

因此，以这种方式规范化原始 JSONB 是可行的。但我也会尝试寻找和比较另一种方法，即在 TEXT 上使用正则表达式，以避免必须创建和维护另一个表。

关于postgresql - 未使用 Postgres `gin_trgm_ops` 索引，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56485455/

26

4

0

文章推荐： java - 在java中接受来自用户的日期并将其写入postgresql

文章推荐： php - mysql + php : Selecting multiple random results

文章推荐： sql - 如何逐列选择用户的最大日期？

postgresql - 未使用 Postgres `gin_trgm_ops` 索引
我正在尝试 speed up Postgres 中的一些文本匹配，使用 pg_trgm 扩展: CREATE TABLE test3 (id bigint, key text, value text)
带有 gin_trgm_ops 选项的 Postgresql BTREE_GIN 索引？
关于 https://www.postgresql.org/docs/current/static/pgtrgm.html它解释了如何使用带有 gin_trgm_ops 选项的特殊 GIN 索引来促进
sql - 将 2 个 GIN 索引与 gin_trgm_ops 合并为一个 | Postgres
我有一张流动的 table create table mytable ( id serial not null primary key, text_id
python - 在 Django 模型中使用 Trigram (gin_trgm_ops) 创建 Gin 索引
django.contrib.postgres 的新 TrigramSimilarity 特性非常适合我遇到的问题。我将它用于搜索栏以查找难以拼写的拉丁名称。问题是有超过 200 万个名字，搜索时间比

首页

博学

6Ren·AI

商城

postgresql - 未使用 Postgres `gin_trgm_ops` 索引