gpt4 book ai didi

postgresql - postgres 不使用索引

转载 作者:行者123 更新时间:2023-11-29 14:37:46 27 4
gpt4 key购买 nike

关于这个主题有很多问题,但所有这些问题似乎都比我现在看到的更复杂,而且答案似乎不适用。

OHDSI=> \d record_counts
Table "results2.record_counts"
Column | Type | Modifiers
------------------------+-----------------------+-----------
concept_id | integer |
schema | text |
table_name | text |
column_name | text |
column_type | text |
descendant_concept_ids | bigint |
rc | numeric |
drc | numeric |
domain_id | character varying(20) |
vocabulary_id | character varying(20) |
concept_class_id | character varying(20) |
standard_concept | character varying(1) |
Indexes:
"rc_dom" btree (domain_id, concept_id)
"rcdom" btree (domain_id)
"rcdomvocsc" btree (domain_id, vocabulary_id, standard_concept)

该表有 3,133,778 条记录,因此 Postgres 不应该因为表太小而忽略该索引。

我在索引的domain_id上过滤,索引被忽略:

OHDSI=> explain select * from record_counts where domain_id = 'Drug';
QUERY PLAN
------------------------------------------------------------------------
Seq Scan on record_counts (cost=0.00..76744.81 rows=2079187 width=87)
Filter: ((domain_id)::text = 'Drug'::text)

我关闭 seqscan 并且:

OHDSI=> set enable_seqscan=false;
SET
OHDSI=> explain select * from record_counts where domain_id = 'Drug';
QUERY PLAN
-------------------------------------------------------------------------------------
Bitmap Heap Scan on record_counts (cost=42042.13..105605.97 rows=2079187 width=87)
Recheck Cond: ((domain_id)::text = 'Drug'::text)
-> Bitmap Index Scan on rcdom (cost=0.00..41522.33 rows=2079187 width=0)
Index Cond: ((domain_id)::text = 'Drug'::text)

确实,该计划说使用索引比不使用索引的成本更高,但为什么呢?如果索引让它处理的记录少得多,使用起来不是更快吗?

最佳答案

好吧,看起来 Postgres 知道它在做什么。我使用的索引列(“药物”)的特定值恰好占表中行的 66%。所以,是的,过滤器使行集显着变小,但由于这些行会分散在页面之间,因此索引不允许更快地检索它们。

OHDSI=> select domain_id, count(*) as rows, round((100 * count(*)::float / 3133778.0)::numeric,4) pct from record_counts group by 1 order by 2 desc;
domain_id | rows | pct
---------------------+---------+---------
Drug | 2074991 | 66.2137
Condition | 466882 | 14.8984
Observation | 217807 | 6.9503
Procedure | 165800 | 5.2907
Measurement | 127239 | 4.0602
Device | 29410 | 0.9385
Spec Anatomic Site | 28783 | 0.9185
Meas Value | 10415 | 0.3323
Unit | 2350 | 0.0750
Type Concept | 2170 | 0.0692
Provider Specialty | 1957 | 0.0624
Specimen | 1767 | 0.0564
Metadata | 1689 | 0.0539
Revenue Code | 538 | 0.0172
Place of Service | 480 | 0.0153
Race | 467 | 0.0149
Relationship | 242 | 0.0077
Condition/Obs | 182 | 0.0058
Currency | 180 | 0.0057
Condition/Meas | 115 | 0.0037
Route | 81 | 0.0026
Obs/Procedure | 78 | 0.0025
Condition/Device | 52 | 0.0017
Condition/Procedure | 25 | 0.0008
Meas/Procedure | 25 | 0.0008
Gender | 19 | 0.0006
Device/Procedure | 9 | 0.0003
Meas Value Operator | 9 | 0.0003
Visit | 8 | 0.0003
Drug/Procedure | 3 | 0.0001
Spec Disease Status | 3 | 0.0001
Ethnicity | 2 | 0.0001

当我在 where 子句中使用任何其他值时(包括“条件”,占行的 15%),Postgres 会使用索引。

(有点奇怪,即使我根据 domain_id 索引对表进行聚类,当我过滤“药物”时它仍然不使用该索引,但是过滤掉 34% 的行的性能提升并没有似乎值得进一步研究。)

关于postgresql - postgres 不使用索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41806803/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com