sql - PostgreSQL 查询未使用索引-6ren

sql - PostgreSQL 查询未使用索引

转载作者：行者123 更新时间：2023-11-29 12:42:54

环境

我的 PostgreSQL (9.2) 架构如下所示:

CREATE TABLE first
(
   id_first bigint NOT NULL,
   first_date timestamp without time zone NOT NULL,
   CONSTRAINT first_pkey PRIMARY KEY (id_first)
)
WITH (
   OIDS=FALSE
);

CREATE INDEX first_first_date_idx
   ON first
   USING btree
     (first_date);

CREATE TABLE second
(
   id_second bigint NOT NULL,
   id_first bigint NOT NULL,
   CONSTRAINT second_pkey PRIMARY KEY (id_second),
   CONSTRAINT fk_first FOREIGN KEY (id_first)
      REFERENCES first (id_first) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
   OIDS=FALSE
);

CREATE INDEX second_id_first_idx
   ON second
   USING btree
   (id_first);

CREATE TABLE third
(
   id_third bigint NOT NULL,
   id_second bigint NOT NULL,
   CONSTRAINT third_pkey PRIMARY KEY (id_third),
   CONSTRAINT fk_second FOREIGN KEY (id_second)
      REFERENCES second (id_second) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
   OIDS=FALSE
);

CREATE INDEX third_id_second_idx
   ON third
   USING btree
   (id_second);

所以，我有 3 个表有自己的 PK。 First在 first_date 上有一个索引, Second有来自 First 的 FK并对其进行索引。 Third作为来自 Second 的 FK并对其进行索引:

 First (0 --> n) Second (0 --> n) Third

First表包含关于 10 000 000记录。 Second表包含关于 20 000 000记录。 Third表包含关于 18 000 000记录。

first_date 列中的日期范围从2016-01-01到今天。

random_cost_page设置为 2.0 . default_statistics_target设置为 100 .全部FK , PK和 first_date STATISTICS设置为 5000

要完成的任务

我想统计所有Third与 First 相关的行, 其中first_date < X

我的查询:

SELECT count(t.id_third) AS count
FROM first f
JOIN second s ON s.id_first = f.id_first 
JOIN third t ON t.id_second = s.id_second
WHERE first_date < _my_date

问题描述

要求 2 天 - _my_date = '2016-01-03'

一切正常。查询持续 1-2 秒。 EXPLAIN ANALYZE :

"Aggregate  (cost=8585512.55..8585512.56 rows=1 width=8) (actual time=67.310..67.310 rows=1 loops=1)"
"  ->  Merge Join  (cost=4208477.49..8583088.04 rows=969805 width=8) (actual time=44.277..65.948 rows=17631 loops=1)"
"        Merge Cond: (s.id_second = t.id_second)"
"        ->  Sort  (cost=4208477.48..4211121.75 rows=1057709 width=8) (actual time=44.263..46.035 rows=19230 loops=1)"
"              Sort Key: s.id_second"
"              Sort Method: quicksort  Memory: 1670kB"
"              ->  Nested Loop  (cost=0.01..4092310.41 rows=1057709 width=8) (actual time=6.169..39.183 rows=19230 loops=1)"
"                    ->  Index Scan using first_first_date_idx on first f  (cost=0.01..483786.81 rows=492376 width=8)  (actual time=6.159..12.223 rows=10346 loops=1)"
"                          Index Cond: (first_date < '2016-01-03 00:00:00'::timestamp without time zone)"
"                    ->  Index Scan using second_id_first_idx on second s  (cost=0.00..7.26 rows=7 width=16) (actual time=0.002..0.002 rows=2 loops=10346)"
"                          Index Cond: (id_first = f.id_first)"
"        ->  Index Scan using third_id_second_idx on third t  (cost=0.00..4316649.89 rows=17193788 width=16) (actual time=0.008..7.293 rows=17632 loops=1)"
"Total runtime: 67.369 ms"

要求 10 天或更长时间 - _my_date = '2016-01-11'或更多

查询未使用 indexscan不再 - 替换为 seqscan最后3-4分钟...查询计划:

"Aggregate  (cost=8731468.75..8731468.76 rows=1 width=8) (actual time=234411.229..234411.229 rows=1 loops=1)"
"  ->  Hash Join  (cost=4352424.81..8728697.88 rows=1108348 width=8) (actual time=189670.068..234400.540 rows=138246 loops=1)"
"        Hash Cond: (t.id_second = o.id_second)"
"        ->  Seq Scan on third t  (cost=0.00..4128080.88 rows=17193788 width=16) (actual time=0.016..124111.453 rows=17570724 loops=1)"
"        ->  Hash  (cost=4332592.69..4332592.69 rows=1208810 width=8) (actual time=98566.740..98566.740 rows=151263 loops=1)"
"              Buckets: 16384  Batches: 16  Memory Usage: 378kB"
"              ->  Hash Join  (cost=561918.25..4332592.69 rows=1208810 width=8) (actual time=6535.801..98535.915 rows=151263 loops=1)"
"                    Hash Cond: (s.id_first = f.id_first)"
"                    ->  Seq Scan on second s  (cost=0.00..3432617.48 rows=18752248 width=16) (actual time=6090.771..88891.691 rows=19132869 loops=1)"
"                    ->  Hash  (cost=552685.31..552685.31 rows=562715 width=8) (actual time=444.630..444.630 rows=81650 loops=1)"
"                          ->  Index Scan using first_first_date_idx on first f  (cost=0.01..552685.31 rows=562715 width=8) (actual time=7.987..421.087 rows=81650 loops=1)"
"                                Index Cond: (first_date < '2016-01-13 00:00:00'::timestamp without time zone)"
"Total runtime: 234411.303 ms"

为了测试目的，我设置了:

 SET enable_seqscan = OFF;

我的查询开始使用 indexscan再次持续 1-10 秒(取决于范围)。

问题

为什么会这样？如何说服查询规划师使用 indexscan ？

编辑

减少一个random_page_cost之后至 1.1 ，我现在可以选择大约 30 天仍在使用 indexscan .查询计划稍微改变了一点:

"Aggregate  (cost=8071389.47..8071389.48 rows=1 width=8) (actual  time=4915.196..4915.196 rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.01..8067832.28 rows=1422878 width=8) (actual time=14.402..4866.937 rows=399184 loops=1)"
"        ->  Nested Loop  (cost=0.01..3492321.55 rows=1551849 width=8) (actual time=14.393..3012.617 rows=436794 loops=1)"
"              ->  Index Scan using first_first_date_idx on first f  (cost=0.01..432541.99 rows=722404 width=8) (actual time=14.372..729.233 rows=236007 loops=1)"
"                    Index Cond: (first_date < '2016-02-01 00:00:00'::timestamp without time zone)"
"              ->  Index Scan using second_id_first_idx on second s  (cost=0.00..4.17 rows=7 width=16) (actual time=0.008..0.009 rows=2 loops=236007)"
"                    Index Cond: (second = f.id_second)"
"        ->  Index Scan using third_id_second_idx on third t  (cost=0.00..2.94 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=436794)"
"              Index Cond: (id_second = s.id_second)"
"Total runtime: 4915.254 ms"

但是，我仍然不明白为什么要求更多的原因 seqscan ...

有趣的是，当我要求范围刚好超过某种限制时，我得到了这样的查询计划(这里选择 40 天 - 要求更多会再次产生完整的 seqscan):

"Aggregate  (cost=8403399.27..8403399.28 rows=1 width=8) (actual time=138303.216..138303.217 rows=1 loops=1)"
"  ->  Hash Join  (cost=3887619.07..8399467.63 rows=1572656 width=8) (actual time=44056.443..138261.203 rows=512062 loops=1)"
"        Hash Cond: (t.id_second = s.id_second)"
"        ->  Seq Scan on third t  (cost=0.00..4128080.88 rows=17193788 width=16) (actual time=0.004..119497.056 rows=17570724 loops=1)"
"        ->  Hash  (cost=3859478.04..3859478.04 rows=1715203 width=8) (actual time=5695.077..5695.077 rows=560503 loops=1)"
"              Buckets: 16384  Batches: 16  Memory Usage: 1390kB"
"              ->  Nested Loop  (cost=0.01..3859478.04 rows=1715203 width=8) (actual time=65.250..5533.413 rows=560503 loops=1)"
"                    ->  Index Scan using first_first_date_idx on first f  (cost=0.01..477985.28 rows=798447 width=8) (actual time=64.927..1688.341 rows=302663 loops=1)"
"                          Index Cond: (first_date < '2016-02-11 00:00:00'::timestamp without time zone)"
"                    ->  Index Scan using second_id_first_idx on second s (cost=0.00..4.17 rows=7 width=16) (actual time=0.010..0.012 rows=2 loops=302663)"
"                          Index Cond: (id_first = f.id_first)"
"Total runtime: 138303.306 ms"

在 Laurenz Able 建议后更新

按照 Laurenz Able 的建议重写查询计划后:

"Aggregate  (cost=9102321.05..9102321.06 rows=1 width=8) (actual time=15237.830..15237.830 rows=1 loops=1)"
"  ->  Merge Join  (cost=4578171.25..9097528.19 rows=1917143 width=8) (actual time=9111.694..15156.092 rows=803657 loops=1)"
"        Merge Cond: (third.id_second = s.id_second)"
"        ->  Index Scan using third_id_second_idx on third  (cost=0.00..4270478.19 rows=17193788 width=16) (actual time=23.650..5425.137 rows=803658 loops=1)"
"        ->  Materialize  (cost=4577722.81..4588177.38 rows=2090914 width=8) (actual time=9088.030..9354.326 rows=879283 loops=1)"
"              ->  Sort  (cost=4577722.81..4582950.09 rows=2090914 width=8) (actual time=9088.023..9238.426 rows=879283 loops=1)"
"                    Sort Key: s.id_second"
"                    Sort Method: external sort  Disk: 15480kB"
"                    ->  Merge Join  (cost=673389.38..4341477.37 rows=2090914 width=8) (actual time=3662.239..8485.768 rows=879283 loops=1)"
"                          Merge Cond: (s.id_first = f.id_first)"
"                          ->  Index Scan using second_id_first_idx on second s  (cost=0.00..3587838.88 rows=18752248 width=16) (actual time=0.015..4204.308 rows=879284 loops=1)"
"                          ->  Materialize  (cost=672960.82..677827.55 rows=973345 width=8) (actual time=3662.216..3855.667 rows=892988 loops=1)"
"                                ->  Sort  (cost=672960.82..675394.19 rows=973345 width=8) (actual time=3662.213..3745.975 rows=476519 loops=1)"
"                                      Sort Key: f.id_first"
"                                      Sort Method: external sort  Disk: 8400kB"
"                                      ->  Index Scan using first_first_date_idx on first f (cost=0.01..568352.90 rows=973345 width=8) (actual time=126.386..3233.134 rows=476519 loops=1)"
"                                            Index Cond: (first_date < '2016-03-03 00:00:00'::timestamp without time zone)"
"Total runtime: 15244.404 ms"

最佳答案

首先，看起来有些估计是错误的。
尝试ANALYZE 表，看看这是否会改变所选的查询计划。

可能还有帮助的是将 random_page_cost 的值降低到略高于 1 的值，看看这是否会改进计划。

有趣的是，快速查询中 third_id_second_idx 的索引扫描仅产生 17632 行而不是超过 1700 万行，我只能通过假设从该行开始的值来解释id_second 不再匹配 first 和 second 的连接中的任何行，即合并连接在此之后完成。

您可以尝试通过重写查询来利用它。尝试

JOIN (SELECT id_second, id_third FROM third ORDER BY id_second) t

代替

JOIN third t

这可能会导致更好的计划，因为 PostgreSQL 不会优化 ORDER BY使用合并联接可能更便宜。这样你就可以欺骗计划者选择一个它不会认为是理想的计划。对于不同的值(value)分布，规划者最初的选择可能会更好。

关于sql - PostgreSQL 查询未使用索引，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40108364/

文章推荐： sql - 可以接受客户端 SQL 吗？如果可以，如何验证？

文章推荐： mysql - 如何通过vb.net在标签中显示mysql数据库中选定的值

文章推荐： sql - 当已经有重复的旧记录时添加唯一性约束

sql - SQL、PL-SQL 和 T-SQL 之间有什么区别？
SQL、PL-SQL 和 T-SQL 之间有什么区别？谁能解释一下这三者之间的区别，并提供每一个的相关使用场景？最佳答案 SQL 是一种对集合进行操作的查询语言。它或多或少是标准化的，几乎所有关
sql - T-SQL、SQL Server 和 SQL 有什么区别
这个问题已经有答案了: What is the difference between SQL, PL-SQL and T-SQL? (6 个回答) 已关闭 9 年前。我对 SQL 的了解足以完成我的
sql - Linq To Sql - SQL 默认约束问题
我在数据库中有一个 USER 表。该表有一个 RegistrationDate 列，该列有一个默认约束为 GETDATE()。使用 LINQ 时，我没有为 RegistrationDate 列提供任
sql - 在字符串中查找第二组数字(SQL/PL-SQL)
我有一个可能属于以下类型的字符串 string expected result 15-th-rp 15 15/12-rp 12 15-12-th
sql - 服务器端 sql 与客户端 sql
很难说出这里问的是什么。这个问题模棱两可、含糊不清、不完整、过于宽泛或言辞激烈，无法以目前的形式合理回答。如需帮助澄清此问题以便可以重新打开，visit the help center . 9年前关闭
sql - sql 如何计算 sql 存储过程中的附加表？
我有一个存储过程(称为 sprocGetArticles)，它从文章表中返回文章列表。这个存储过程没有任何参数。用户可以对每篇文章发表评论，我将这些评论存储在由文章 ID 链接的评论表中。有什么方
sql - 嵌入式 SQL 与动态 SQL
我目前正在做一个 *cough*Oracle*cough* 数据库主题。讲师介绍embedded SQL作为让其他语言(例如 C、C++)与(Oracle)数据库交互的方式。我自己做了一些数据库工作
sql - SQL Server SQL 语句可以有多少个字符？
SQL Server 中 SQL 语句的最大长度是多少？这个长度是否取决于 SQL Server 的版本？例如，在 DECLARE @SQLStatement NVARCHAR(MAX) = N'S
sql-server - SQL 行到列 sql
这个问题已经有答案了: Simple way to transpose columns and rows in SQL? (9 个回答) 已关闭 8 年前。 CallType
sql - SQL Server SQL 语句中的动态日期
预先感谢您对此提供的任何帮助。假设我有一个查询，可以比较跨年的数据，从某个任意年份开始，永无止境(进入 future )，每年同一时期直到最后一个完整的月份(其特点是一月数据永远不会显示至 2 月
sql - Linq To Sql - SQL 默认约束问题
我在数据库中有一个 USER 表。该表有一个 RegistrationDate 列，该列的默认约束为 GETDATE()。使用 LINQ 时，我没有为 RegistrationDate 列提供任何数
sql - (SQL Server) SQL 不允许在检查过程是否存在后创建过程
下面是我试图用来检查存储过程是否不存在然后创建过程的 sql。它会抛出一个错误:Incorrect syntax near the keyword 'PROCEDURE' IF NOT EXISTS
sql - 动态 SQL 是否比 SQL Server 中的静态 SQL 性能更高？
我有一个同事声称动态 SQL 在许多情况下比静态 SQL 执行得更快，所以我经常看到 DSQL 到处都是。除了明显的缺点，比如在运行之前无法检测到错误并且更难阅读，这是否准确？当我问他为什么一直使用
sql - exec sp_executesql @sql 和 exec (@sql) SQL Server
来自 lobodava 的动态 SQL 查询是: declare @sql nvarchar(4000) = N';with cteColumnts (ORDINAL_POSITION, CO
sql - 动态 SQL - EXEC(@SQL) 与 EXEC SP_EXECUTESQL(@SQL)
使用 SQL Server 中的存储过程执行动态 SQL 命令的现实优点和缺点是什么 EXEC (@SQL) 对比 EXEC SP_EXECUTESQL @SQL ？最佳答案 sp_executes
c# - SQL > Linq to Sql，SQL 查询有效，Linq to SQL 返回空数据集
我有这个有效的 SQL 查询: select sum(dbos.Points) as Points, dboseasons.Year from dbo.StatLines dbos i
sql-server - "> sql.txt && sql -h-1 -i sql.txt && del sql.txt"命令是什么意思？
我正在调试一些构建成功运行的 SQL 命令的代码。然而，在查询结束时，查询结果似乎被写入了一个文本文件。完整的查询如下 echo SELECT DATE,DATETABLE,DATE,APPDAT
sql - 如何从 MS SQL 数据库(Microsoft SQL Server)中的其他 .sql 文件运行 .sql 文件？
我有一些创建表的 .sql 文件(MS SQL 数据库): 表_1.sql: IF OBJECT_ID (N'my_schema.table1', N'U') IS NOT NULL DROP TAB
sql - 如何在查询中使用 SQL 变量(SQL Server)？
我写了下面的 SQL 存储过程，它一直给我错误@pid = SELECT MAX(... 整个过程是: Alter PROCEDURE insert_partyco @pname varchar(20
sql - 如何将两个列表转换为邻接矩阵 SQL Server T-SQL？
我在 SQL Server 2005 中有包含两列 Fruit 和 Color 的表，如下所示 Fruit Colour Apple Red Orange

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

sql - PostgreSQL 查询未使用索引