gpt4 book ai didi

postgresql - 用于排序和连接的 Postgres 专有索引

转载 作者:行者123 更新时间:2023-11-29 11:47:00 25 4
gpt4 key购买 nike

我有一个简单的架构和查询,但在使用某些参数时遇到了持续糟糕的性能。

架构:

CREATE TABLE locations (
id integer NOT NULL,
barcode_id integer NOT NULL
);

CREATE TABLE barcodes (
id integer NOT NULL,
value citext NOT NULL
);

ALTER TABLE ONLY locations ADD CONSTRAINT locations_pkey PRIMARY KEY (id);
ALTER TABLE ONLY barcodes ADD CONSTRAINT barcodes_pkey PRIMARY KEY (id);
ALTER TABLE ONLY locations ADD CONSTRAINT fk_locations_barcodes FOREIGN KEY (barcode_id) REFERENCES barcodes(id);

CREATE INDEX index_barcodes_on_value ON barcodes (value);
CREATE INDEX index_locations_on_barcode_id ON locations (barcode_id);

查询:

EXPLAIN ANALYZE
SELECT *
FROM locations
JOIN barcodes ON locations.barcode_id = barcodes.id
ORDER BY barcodes.value ASC
LIMIT 50;

分析:

Limit  (cost=0.71..3564.01 rows=50 width=34) (actual time=0.043..683.025 rows=50 loops=1)
-> Nested Loop (cost=0.71..4090955.00 rows=57404 width=34) (actual time=0.043..683.017 rows=50 loops=1)
-> Index Scan using index_barcodes_on_value on barcodes (cost=0.42..26865.99 rows=496422 width=15) (actual time=0.023..218.775 rows=372138 loops=1)
-> Index Scan using index_locations_on_barcode_id on locations (cost=0.29..5.32 rows=287 width=8) (actual time=0.001..0.001 rows=0 loops=372138)
Index Cond: (barcode_id = barcodes.id)
Planning time: 0.167 ms
Execution time: 683.078 ms

500+ ms 我的模式中的条目数(500,000 个条形码和 60,000 个位置)没有意义。我可以做些什么来提高性能吗?

注意:

更奇怪的是执行时间取决于数据。在起草这个问题时,我试图包括种子随机数据,但种子似乎是高性能的:

种子:

INSERT INTO barcodes (id, value) SELECT seed.id, gen_random_uuid() FROM generate_series(1,500000) AS seed(id);
INSERT INTO locations (id, barcode_id) SELECT seed.id, (RANDOM() * 500000) FROM generate_series(1,60000) AS seed(id);

分析:

Limit  (cost=0.71..3602.63 rows=50 width=86) (actual time=0.089..1.123 rows=50 loops=1)
-> Nested Loop (cost=0.71..4330662.42 rows=60116 width=86) (actual time=0.088..1.115 rows=50 loops=1)
-> Index Scan using index_barcodes_on_value on barcodes (cost=0.42..44972.42 rows=500000 width=41) (actual time=0.006..0.319 rows=376 loops=1)
-> Index Scan using index_locations_on_barcode_id on locations (cost=0.29..5.56 rows=301 width=8) (actual time=0.002..0.002 rows=0 loops=376)
Index Cond: (barcode_id = barcodes.id)
Planning time: 0.213 ms
Execution time: 1.152 ms

编辑:

表格分析:

ANALYZE VERBOSE barcodes;
INFO: analyzing "public.barcodes"
INFO: "barcodes": scanned 2760 of 2760 pages, containing 496157 live
rows and 0 dead rows; 30000 rows in sample, 496157 estimated total rows
ANALYZE
Time: 62.937 ms

ANALYZE VERBOSE locations;
INFO: analyzing "public.locations"
INFO: "locations": scanned 254 of 254 pages, containing 57394 live rows
and 0 dead rows; 30000 rows in sample, 57394 estimated total rows
ANALYZE
Time: 21.447 ms

最佳答案

问题是具有低值的条形码位置中没有匹配项,PostgreSQL 无法知道这一点。因此,它计划通过索引以正确的输出顺序获取 条形码,然后从 locations 连接值,直到找到其中的 50 个,这比预期的要糟糕得多。

我会ANALYZE 表格和

DROP INDEX index_barcodes_on_value;

这应该会阻止 PostgreSQL 选择该计划。

不知道届时PostgreSQL会选择什么方案。对于嵌套循环,以下索引可能有帮助:

CREATE INDEX ON locations(id);

关于postgresql - 用于排序和连接的 Postgres 专有索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43083895/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com