gpt4 book ai didi

sql - 为什么我使用 LIKE 的查询执行序列扫描?

转载 作者:行者123 更新时间:2023-11-29 14:15:35 25 4
gpt4 key购买 nike

我有一个包含 220 万行的表。

                                     Table "public.index"
Column | Type | Modifiers
-----------+-----------------------------+-----------------------------------------------------
fid | integer | not null default nextval('index_fid_seq'::regclass)
location | character varying |
Indexes:
"index_pkey" PRIMARY KEY, btree (fid)
"location_index" btree (location text_pattern_ops)

位置是文件的完整路径,但我需要使用文件所在文件夹的名称进行查询。该文件夹名称在表中是唯一的。

为了避免以 % 开头,我搜索了我知道的完整路径:

select fid from index where location like '/path/to/folder/%'

解释分析:

    QUERY PLAN                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on index (cost=0.00..120223.34 rows=217 width=4) (actual time=1181.701..1181.701 rows=0 loops=1)
Filter: ((location)::text ~~ '/path/to/folder/%'::text)
Rows Removed by Filter: 2166034
Planning time: 0.954 ms
Execution time: 1181.748 ms
(5 rows)

问题不在于如何解决,因为我发现对于我的情况:

创建foldername_index

create index on index (substring(location, '(?<=/path/to/)[^\/]*');

我可以成功使用folder_name查询:

explain analyze select fid from index where substring(location, '(?<=/path/to/)[^\/]*') = 'foldername';

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on index (cost=600.49..31524.74 rows=10830 width=12) (actual time=0.030..0.030 rows=1 loops=1)
Recheck Cond: ("substring"((location)::text, '(?<=/path/to/)[^\/]*'::text) = 'folder_name'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on foldername_index (cost=0.00..597.78 rows=10830 width=0) (actual time=0.023..0.023 rows=1 loops=1)
Index Cond: ("substring"((location)::text, '(?<=/path/to/)[^\/]*'::text) = 'folder_name'::text)
Planning time: 0.115 ms
Execution time: 0.059 ms
(7 rows)

我已经关注了 PostgreSQL FAQ :

When using wild-card operators such as LIKE or ~, indexes can only be used in certain circumstances:

The beginning of the search string must be anchored to the start of the string, i.e.

LIKE patterns must not start with % or _.

The search string can not start with a character class, e.g. [a-e].

在我的查询中并非如此。

C locale must be used during initdb because sorting in a non-C locale often doesn't match the behavior of LIKE. You can create a special text_pattern_ops index that will work in such cases, but note it is only helpful for LIKE indexing.

我有 C 语言环境:

# show LC_COLLATE;
lc_collate
------------
C
(1 row)

我也遵循了这个 great answer 的说明在 Stack Overflow 上,这就是我使用 text_pattern_ops 的原因,它没有改变任何东西。不幸的是,我无法安装新模块。

那么:为什么我的查询执行序列扫描?

最佳答案

经过反复思考,我自己找到了解决方案。尽管对某些人来说这可能是显而易见的,但它可能对其他人有帮助:

/path/to/folder其实就是/the_path/to/folder/(路径中有下划线)。但是 _ 是 SQL 中的通配符(如 %)。

select fid from index where location like '/the_path/to/folder/%'

使用 seq 扫描,因为索引无法过滤任何行,因为下划线之前的部分对于所有行都是相同的。

select fid from index where location like '/the\_path/to/folder/%'

使用索引扫描。

关于sql - 为什么我使用 LIKE 的查询执行序列扫描?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49169005/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com