gpt4 book ai didi

sql - 按联接表中的列排序的慢查询

转载 作者:行者123 更新时间:2023-11-29 11:17:34 24 4
gpt4 key购买 nike

在查询中引入 ORDER BY 子句会增加总时间,因为数据库必须执行额外的工作才能对结果集进行排序:

  • 将生成的元组复制到一些临时内存中
  • 对它们进行排序(希望在内存中,否则使用磁盘)
  • 将结果流式传输给客户端

我想念的是为什么仅从连接表中添加一列会产生如此不同的性能。

查询1

EXPLAIN ANALYZE
SELECT p.*
FROM product_product p
JOIN django_site d ON (p.site_id = d.id)
WHERE (p.active = true AND p.site_id = 1 )
ORDER BY d.domain, p.ordering, p.name

查询计划

Sort  (cost=3909.83..3952.21 rows=16954 width=1086) (actual time=1120.618..1143.922 rows=16946 loops=1)
Sort Key: django_site.domain, product_product.ordering, product_product.name
Sort Method: quicksort Memory: 25517kB
-> Nested Loop (cost=0.00..2718.86 rows=16954 width=1086) (actual time=0.053..87.396 rows=16946 loops=1)
-> Seq Scan on django_site (cost=0.00..1.01 rows=1 width=24) (actual time=0.010..0.012 rows=1 loops=1)
Filter: (id = 1)
-> Seq Scan on product_product (cost=0.00..2548.31 rows=16954 width=1066) (actual time=0.036..44.138 rows=16946 loops=1)
Filter: (product_product.active AND (product_product.site_id = 1))
Total runtime: 1182.515 ms

查询 2

同上但不是按django_site.domain排序

查询计划

 Sort  (cost=3909.83..3952.21 rows=16954 width=1066) (actual time=257.094..278.905 rows=16946 loops=1)
Sort Key: product_product.ordering, product_product.name
Sort Method: quicksort Memory: 25161kB
-> Nested Loop (cost=0.00..2718.86 rows=16954 width=1066) (actual time=0.075..86.120 rows=16946 loops=1)
-> Seq Scan on django_site (cost=0.00..1.01 rows=1 width=4) (actual time=0.015..0.017 rows=1 loops=1)
Filter: (id = 1)
-> Seq Scan on product_product (cost=0.00..2548.31 rows=16954 width=1066) (actual time=0.052..44.024 rows=16946 loops=1)
Filter: (product_product.active AND (product_product.site_id = 1))
Total runtime: 305.392 ms

This question可能是相关的。

编辑:添加了更多细节

           Table "public.product_product"
Column | Type |
-------------+------------------------+---------
id | integer | not null default nextval('product_product_id_seq'::regclass)
site_id | integer | not null
name | character varying(255) | not null
slug | character varying(255) | not null
sku | character varying(255) |
ordering | integer | not null
[snip some columns ]

Indexes:
"product_product_pkey" PRIMARY KEY, btree (id)
"product_product_site_id_key" UNIQUE, btree (site_id, sku)
"product_product_site_id_key1" UNIQUE, btree (site_id, slug)
"product_product_site_id" btree (site_id)
"product_product_slug" btree (slug)
"product_product_slug_like" btree (slug varchar_pattern_ops)


Table "public.django_site"
Column | Type |
--------+------------------------+----------
id | integer | not null default nextval('django_site_id_seq'::regclass)
domain | character varying(100) | not null
name | character varying(50) | not null
Indexes:
"django_site_pkey" PRIMARY KEY, btree (id)

Postgres 版本为 8.4

一些表格统计信息:

# select count(*) from django_site;
count
-------
1

# select count(*) from product_product;
count
-------
17540

# select active, count(*) from product_product group by active;
active | count
--------+-------
f | 591
t | 16949

# select site_id, count(*) from product_product group by site_id;
site_id | count
---------+-------
1 | 17540

最佳答案

测试用例

PostgreSQL 9.1。测试资源有限的数据库,但对于这个小案例来说已经足够了。整理的语言环境将是相关的:

SHOW LC_COLLATE;

de_AT.UTF-8

第一步)重构原始测试环境

-- DROP TABLE x;
CREATE SCHEMA x; -- test schema

-- DROP TABLE x.django_site;
CREATE TABLE x.django_site (
id serial primary key
,domain character varying(100) not null
,int_col int not null
);
INSERT INTO x.django_site values (1,'www.testsite.com/foodir/', 3);

-- DROP TABLE x.product;
CREATE TABLE x.product (
id serial primary key
,site_id integer not null
,name character varying(255) not null
,slug character varying(255) not null
,sku character varying(255)
,ordering integer not null
,active boolean not null
);

INSERT INTO x.product (site_id, name, slug, sku, ordering, active)
SELECT 1
,repeat(chr((random() * 255)::int + 32), (random()*255)::int)
,repeat(chr((random() * 255)::int + 32), (random()*255)::int)
,repeat(chr((random() * 255)::int + 32), (random()*255)::int)
,i -- ordering in sequence
,NOT (random()* 0.5174346569119122)::int::bool
FROM generate_series(1, 17540) AS x(i);
-- SELECT ((591::float8 / 17540)* 0.5) / (1 - (591::float8 / 17540))
-- = 0.5174346569119122

CREATE INDEX product_site_id on x.product(site_id);

第 2 步)分析

    ANALYZE x.product;
ANALYZE x.django_site;

第 3 步)通过 random() 重新排序

-- DROP TABLE x.p;
CREATE TABLE x.p AS
SELECT *
FROM x.product
ORDER BY random();

ANALYZE x.p;

结果

EXPLAIN ANALYZE
SELECT p.*
FROM x.p
JOIN x.django_site d ON (p.site_id = d.id)
WHERE p.active
AND p.site_id = 1
-- ORDER BY d.domain, p.ordering, p.name
-- ORDER BY p.ordering, p.name
-- ORDER BY d.id, p.ordering, p.name
-- ORDER BY d.int_col, p.ordering, p.name
-- ORDER BY p.name COLLATE "C"
-- ORDER BY d.domain COLLATE "C", p.ordering, p.name -- dvd's final solution

1) Pre ANALYZE (-> 位图索引扫描)
2) 分析后(-> 序列扫描)
3) 通过random()重新排序,ANALYZE

ORDER  BY d.domain, p.ordering, p.name

1) 总运行时间:1253.543 毫秒
2) 总运行时间:1250.351 毫秒
3) 总运行时间:1283.111 毫秒

ORDER  BY p.ordering, p.name

1) 总运行时间:177.266 毫秒
2) 总运行时间:174.556 毫秒
3) 总运行时间:177.797 毫秒

ORDER  BY d.id, p.ordering, p.name

1) 总运行时间:176.628 毫秒
2) 总运行时间:176.811 毫秒
3) 总运行时间:178.150 毫秒
规划器显然考虑到 d.id 在功能上是依赖的。

ORDER  BY d.int_col, p.ordering, p.name -- integer column in other table

1) Total runtime: 242.218 ms -- !!
2) Total runtime: 245.234 ms
3) Total runtime: 254.581 ms
The planner obviously misses that d.int_col (NOT NULL) is just as functionally dependent. But sorting by an integer column is cheap.

ORDER  BY p.name -- varchar(255) in same table

1) Total runtime: 2259.171 ms -- !!
2) Total runtime: 2257.650 ms
3) Total runtime: 2258.282 ms
Sorting by a (long) varchar or text column is expensive ...

ORDER  BY p.name COLLATE "C"

1) Total runtime: 327.516 ms -- !!
2) Total runtime: 325.103 ms
3) Total runtime: 327.206 ms
... but not as expensive if done without locale.

With the locale out of the way, sorting by a varchar column is not quite but almost as fast. Locale "C" is effectively "no locale, just order by byte value". I quote the manual:

If you want the system to behave as if it had no locale support, use the special locale name C, or equivalently POSIX.


Putting it all together, @dvd chose:

ORDER  BY d.domain COLLATE "C", p.ordering, p.name

...3) 总运行时间:275.854 毫秒
应该可以。

关于sql - 按联接表中的列排序的慢查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9888096/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com