postgresql - 如何解释以下 PostgreSQL 查询计划-6ren

postgresql - 如何解释以下 PostgreSQL 查询计划

转载作者：行者123 更新时间：2023-11-29 12:38:00

请注意:

(忘记加单了，计划更新了)

查询:

EXPLAIN ANALYZE
SELECT DISTINCT(id), special, customer, business_no, bill_to_name, bill_to_address1, bill_to_address2, bill_to_postal_code, ship_to_name, ship_to_address1, ship_to_address2, ship_to_postal_code, 
purchase_order_no, ship_date::text, calc_discount_text(o) AS discount, discount_absolute, delivery, hst_percents, sub_total, total_before_hst, hst, total, total_discount, terms, rep, ship_via, 
item_count, version, to_char(modified, 'YYYY-MM-DD HH24:MI:SS') AS "modified", to_char(created, 'YYYY-MM-DD HH24:MI:SS') AS "created"
FROM invoices o
LEFT JOIN reps ON reps.rep_id = o.rep_id
LEFT JOIN terms ON terms.terms_id = o.terms_id
LEFT JOIN shipVia ON shipVia.ship_via_id = o.ship_via_id
JOIN invoiceItems items ON items.invoice_id = o.id 
WHERE items.qty < 5
ORDER BY modified
LIMIT 100

结果:

Limit  (cost=2931740.10..2931747.85 rows=100 width=635) (actual time=414307.004..414387.899 rows=100 loops=1)
  ->  Unique  (cost=2931740.10..3076319.37 rows=1865539 width=635) (actual time=414307.001..414387.690 rows=100 loops=1)
        ->  Sort  (cost=2931740.10..2936403.95 rows=1865539 width=635) (actual time=414307.000..414325.058 rows=2956 loops=1)
              Sort Key: (to_char(o.modified, 'YYYY-MM-DD HH24:MI:SS'::text)), o.id, o.special, o.customer, o.business_no, o.bill_to_name, o.bill_to_address1, o.bill_to_address2, o.bill_to_postal_code, o.ship_to_name, o.ship_to_address1, o.ship_to_address2, (...)
              Sort Method: external merge  Disk: 537240kB
              ->  Hash Join  (cost=11579.63..620479.38 rows=1865539 width=635) (actual time=1535.805..131378.864 rows=1872673 loops=1)
                    Hash Cond: (items.invoice_id = o.id)
                    ->  Seq Scan on invoiceitems items  (cost=0.00..78363.45 rows=1865539 width=4) (actual time=0.110..4591.117 rows=1872673 loops=1)
                          Filter: (qty < 5)
                          Rows Removed by Filter: 1405763
                    ->  Hash  (cost=5498.18..5498.18 rows=64996 width=635) (actual time=1530.786..1530.786 rows=64996 loops=1)
                          Buckets: 1024  Batches: 64  Memory Usage: 598kB
                          ->  Hash Left Join  (cost=113.02..5498.18 rows=64996 width=635) (actual time=0.214..1043.207 rows=64996 loops=1)
                                Hash Cond: (o.ship_via_id = shipvia.ship_via_id)
                                ->  Hash Left Join  (cost=75.35..4566.81 rows=64996 width=607) (actual time=0.154..754.957 rows=64996 loops=1)
                                      Hash Cond: (o.terms_id = terms.terms_id)
                                      ->  Hash Left Join  (cost=37.67..3800.33 rows=64996 width=579) (actual time=0.071..506.145 rows=64996 loops=1)
                                            Hash Cond: (o.rep_id = reps.rep_id)
                                            ->  Seq Scan on invoices o  (cost=0.00..2868.96 rows=64996 width=551) (actual time=0.010..235.977 rows=64996 loops=1)
                                            ->  Hash  (cost=22.30..22.30 rows=1230 width=36) (actual time=0.044..0.044 rows=4 loops=1)
                                                  Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                                  ->  Seq Scan on reps  (cost=0.00..22.30 rows=1230 width=36) (actual time=0.027..0.032 rows=4 loops=1)
                                      ->  Hash  (cost=22.30..22.30 rows=1230 width=36) (actual time=0.067..0.067 rows=3 loops=1)
                                            Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                            ->  Seq Scan on terms  (cost=0.00..22.30 rows=1230 width=36) (actual time=0.001..0.007 rows=3 loops=1)
                                ->  Hash  (cost=22.30..22.30 rows=1230 width=36) (actual time=0.043..0.043 rows=4 loops=1)
                                      Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                      ->  Seq Scan on shipvia  (cost=0.00..22.30 rows=1230 width=36) (actual time=0.027..0.032 rows=4 loops=1)
Total runtime: 414488.582 ms

这显然很糟糕。我对解释查询计划还很陌生，想知道如何从这样的计划中提取有用的性能改进提示。

编辑 1

此查询涉及两种实体 - 具有一对多关系的发票和发票项目。
发票项目指定其在父发票中的数量。
给定的查询返回 100 张发票，其中至少有一项数量少于 5。

这应该可以解释为什么我需要 DISTINCT - 一张发票可能有几项满足过滤器，但我不希望多次返回同一张发票。因此使用了 DISTINCT。但是，我完全意识到可能有比使用 DISTINCT 更好的方法来实现相同的语义 - 我非常愿意了解它们。

编辑 2

请在查询时在 invoiceItems 表中找到以下索引:

CREATE INDEX invoiceitems_invoice_id_idx ON invoiceitems (invoice_id);
CREATE INDEX invoiceitems_invoice_id_name_index ON invoiceitems (invoice_id, name varchar_pattern_ops);
CREATE INDEX invoiceitems_name_index ON invoiceitems (name varchar_pattern_ops);
CREATE INDEX invoiceitems_qty_index ON invoiceitems (qty);

编辑 3

https://stackoverflow.com/users/808806/yieldsfalsehood 给出的关于如何消除 DISTINCT(以及为什么)的建议被证明是一个非常好的建议。这是新查询:

EXPLAIN ANALYZE
SELECT id, special, customer, business_no, bill_to_name, bill_to_address1, bill_to_address2, bill_to_postal_code, ship_to_name, ship_to_address1, ship_to_address2, ship_to_postal_code, 
purchase_order_no, ship_date::text, calc_discount_text(o) AS discount, discount_absolute, delivery, hst_percents, sub_total, total_before_hst, hst, total, total_discount, terms, rep, ship_via, 
item_count, version, to_char(modified, 'YYYY-MM-DD HH24:MI:SS') AS "modified", to_char(created, 'YYYY-MM-DD HH24:MI:SS') AS "created"
FROM invoices o
LEFT JOIN reps ON reps.rep_id = o.rep_id
LEFT JOIN terms ON terms.terms_id = o.terms_id
LEFT JOIN shipVia ON shipVia.ship_via_id = o.ship_via_id
WHERE EXISTS (SELECT 1 FROM invoiceItems items WHERE items.invoice_id = id AND items.qty < 5)
ORDER BY modified DESC
LIMIT 100

这是新计划:

Limit  (cost=64717.14..64717.39 rows=100 width=635) (actual time=7830.347..7830.869 rows=100 loops=1)
  ->  Sort  (cost=64717.14..64827.01 rows=43949 width=635) (actual time=7830.334..7830.568 rows=100 loops=1)
        Sort Key: (to_char(o.modified, 'YYYY-MM-DD HH24:MI:SS'::text))
        Sort Method: top-N heapsort  Memory: 76kB
        ->  Hash Left Join  (cost=113.46..63037.44 rows=43949 width=635) (actual time=2.322..6972.679 rows=64467 loops=1)
              Hash Cond: (o.ship_via_id = shipvia.ship_via_id)
              ->  Hash Left Join  (cost=75.78..50968.72 rows=43949 width=607) (actual time=0.650..3809.276 rows=64467 loops=1)
                    Hash Cond: (o.terms_id = terms.terms_id)
                    ->  Hash Left Join  (cost=38.11..50438.25 rows=43949 width=579) (actual time=0.550..3527.558 rows=64467 loops=1)
                          Hash Cond: (o.rep_id = reps.rep_id)
                          ->  Nested Loop Semi Join  (cost=0.43..49796.28 rows=43949 width=551) (actual time=0.015..3200.735 rows=64467 loops=1)
                                ->  Seq Scan on invoices o  (cost=0.00..2868.96 rows=64996 width=551) (actual time=0.002..317.954 rows=64996 loops=1)
                                ->  Index Scan using invoiceitems_invoice_id_idx on invoiceitems items  (cost=0.43..7.61 rows=42 width=4) (actual time=0.030..0.030 rows=1 loops=64996)
                                      Index Cond: (invoice_id = o.id)
                                      Filter: (qty < 5)
                                      Rows Removed by Filter: 1
                          ->  Hash  (cost=22.30..22.30 rows=1230 width=36) (actual time=0.213..0.213 rows=4 loops=1)
                                Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                ->  Seq Scan on reps  (cost=0.00..22.30 rows=1230 width=36) (actual time=0.183..0.192 rows=4 loops=1)
                    ->  Hash  (cost=22.30..22.30 rows=1230 width=36) (actual time=0.063..0.063 rows=3 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 1kB
                          ->  Seq Scan on terms  (cost=0.00..22.30 rows=1230 width=36) (actual time=0.044..0.050 rows=3 loops=1)
              ->  Hash  (cost=22.30..22.30 rows=1230 width=36) (actual time=0.096..0.096 rows=4 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 1kB
                    ->  Seq Scan on shipvia  (cost=0.00..22.30 rows=1230 width=36) (actual time=0.071..0.079 rows=4 loops=1)
Total runtime: 7832.750 ms

这是我能指望的最好的吗？我已重新启动服务器(以清理数据库缓存)并在没有 EXPLAIN ANALYZE 的情况下重新运行查询。大约需要 5 秒钟。是否可以进一步改进？我有 65,000 张发票和 3,278,436 个发票项目。

编辑 4

找到了。我按计算结果排序，modified = to_char(modified, 'YYYY-MM-DD HH24:MI:SS')。在修改后的发票字段上添加索引并按字段本身排序可将结果缩短到 100 毫秒以下!

最终的方案是:

Limit  (cost=1.18..1741.92 rows=100 width=635) (actual time=3.002..27.065 rows=100 loops=1)
  ->  Nested Loop Left Join  (cost=1.18..765042.09 rows=43949 width=635) (actual time=2.989..25.989 rows=100 loops=1)
        ->  Nested Loop Left Join  (cost=1.02..569900.41 rows=43949 width=607) (actual time=0.413..16.863 rows=100 loops=1)
              ->  Nested Loop Left Join  (cost=0.87..386185.48 rows=43949 width=579) (actual time=0.333..15.694 rows=100 loops=1)
                    ->  Nested Loop Semi Join  (cost=0.72..202470.54 rows=43949 width=551) (actual time=0.017..13.965 rows=100 loops=1)
                          ->  Index Scan Backward using invoices_modified_index on invoices o  (cost=0.29..155543.23 rows=64996 width=551) (actual time=0.003..4.543 rows=100 loops=1)
                          ->  Index Scan using invoiceitems_invoice_id_idx on invoiceitems items  (cost=0.43..7.61 rows=42 width=4) (actual time=0.079..0.079 rows=1 loops=100)
                                Index Cond: (invoice_id = o.id)
                                Filter: (qty < 5)
                                Rows Removed by Filter: 1
                    ->  Index Scan using reps_pkey on reps  (cost=0.15..4.17 rows=1 width=36) (actual time=0.007..0.008 rows=1 loops=100)
                          Index Cond: (rep_id = o.rep_id)
              ->  Index Scan using terms_pkey on terms  (cost=0.15..4.17 rows=1 width=36) (actual time=0.003..0.004 rows=1 loops=100)
                    Index Cond: (terms_id = o.terms_id)
        ->  Index Scan using shipvia_pkey on shipvia  (cost=0.15..4.17 rows=1 width=36) (actual time=0.006..0.008 rows=1 loops=100)
              Index Cond: (ship_via_id = o.ship_via_id)
Total runtime: 27.572 ms

太棒了!谢谢大家的帮助。

最佳答案

对于初学者来说，将解释计划发布到 http://explain.depesz.com 是非常标准的做法- 这将为它添加一些漂亮的格式，为您提供分发计划的好方法，并让您匿名化可能包含敏感数据的计划。即使您没有分发计划，它也会让您更容易理解正在发生的事情，有时还可以准确说明瓶颈所在。

有无数的资源涵盖解释 postgres 解释计划的细节(参见 https://wiki.postgresql.org/wiki/Using_EXPLAIN)。数据库选择计划时需要考虑很多小细节，但有一些通用概念可以使其更容易。首先，了解数据和索引的基于页面的布局(您不需要了解页面格式的详细信息，只需了解数据和索引如何拆分到页面中即可)。从那里开始，了解两种基本的数据访问方法——全表扫描和索引扫描——稍微想一想，应该开始清楚在不同的情况下，一种比另一种更受欢迎(还要记住，索引扫描甚至并不总是可能的)。届时，您可以开始研究一些影响计划选择的不同配置项，了解它们如何使规模倾斜以支持表扫描或索引扫描。

一旦你了解了它，继续向上移动计划并阅读你找到的不同节点的详细信息 - 在这个计划中你有很多散列连接，所以首先阅读它.然后，为了同类比较，完全禁用散列连接(“set enable_hashjoin = false;”)并再次运行解释分析。现在您看到什么连接方法？仔细阅读。将该方法的估计成本与散列连接的估计成本进行比较。为什么它们可能不同？第二个计划的估计成本将高于第一个计划(否则它会首先被首选)但是运行第二个计划所需的实际时间如何？是低了还是高了？

最后，专门针对这个计划。关于那种需要很长时间的类型:distinct 不是一个函数。 “DISTINCT(id)”并没有说“给我所有仅在列 id 上不同的行”，而是对行进行排序并根据输出中的所有列获取唯一值(即等同于编写“不同的 ID ...”)。您可能应该重新考虑您是否真的需要那里的独特之处。规范化往往会设计出对 distinct 的需求，虽然偶尔会需要它们，但它们是否真的 super 真正需要并不总是如此。

关于postgresql - 如何解释以下 PostgreSQL 查询计划，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20424053/

文章推荐： c# - NpgsqlDataReader - 将空值与 "NULL"文字区分开来

文章推荐： postgresql - PG::UndefinedFunction:错误:运算符不存在:几何&&框

Mysql 查询 JOIN 查询
我有三张 table 。表 A 有选项名称(即颜色、尺寸)。表 B 有选项值名称(即蓝色、红色、黑色等)。表C通过将选项名称id和选项名称值id放在一起来建立关系。我的查询需要显示值和选项的名称，而
查询
在mysql中，如何计算一行中的非空单元格？我只想计算某些列之间的单元格，比如第 3-10 列之间的单元格。不是所有的列...同样，仅在该行中。最佳答案如果你想这样做，只能在 sql 中使用名称而
sql - 查询、 native 查询、命名查询和类型化查询之间的区别
关闭。这个问题需要多问focused 。目前不接受答案。想要改进此问题吗？更新问题，使其仅关注一个问题 editing this post . 已关闭 7 年前。 Improve this ques
elasticsearch - 在Elasticsearch查询中没有为[查询]注册的[查询]
我正在为版本7.6进行Elasticsearch查询我的查询是这样的: { "query": { "bool": { "should": [ {
sql - 查询、 native 查询、命名查询和类型化查询之间的区别
关闭。这个问题需要多问focused 。目前不接受答案。想要改进此问题吗？更新问题，使其仅关注一个问题 editing this post . 已关闭 7 年前。 Improve this ques
php - Mysql WHERE NOT EXISTS(查询)OR(查询)
是否可以编写一个查询来检查任一子查询(而不是一个子查询)是否正确？ SELECT * FROM employees e WHERE NOT EXISTS (
javascript - 查询。为表中的每一行发送 ajax 查询
我找到了很多关于我的问题的答案，但问题没有解决我有表格，有数据，例如: Data 1 Data 2 Data 3
salesforce - SOQL 查询 - 如何通过将字段设为小写并进行比较来编写 SOQL 查询？
以下查询返回错误: 查询: SELECT Id, FirstName, LastName, OwnerId, PersonEmail FROM Account WHERE lower(PersonEm
salesforce - SOQL 查询 - 如何通过将字段设为小写并进行比较来编写 SOQL 查询？
以下查询返回错误: 查询: SELECT Id, FirstName, LastName, OwnerId, PersonEmail FROM Account WHERE lower(PersonEm
Android SQLite 查询(我想解析一般的 SQL 查询)
我从 EditText 中获取了 String 值。以及提交查询的按钮。 String sql=editQuery.getText().toString();// SELECT * FROM empl
mysql 查询 - 为一个巨大的表优化现有的 MAX-MIN 查询
我有一个或多或少有效的查询(关于结果)，但处理大约需要 45 秒。这对于在 GUI 中呈现数据来说肯定太长了。所以我的需求是找到一个更快/更高效的查询(几毫秒左右会很好)我的数据表大约有 3000
SQL 查询 - 将 NULL 结果添加到 SELECT 查询
这是我第一次使用 Stack Overflow，所以我希望我以正确的方式提出这个问题。我有 2 个 SQL 查询，我正在尝试比较和识别缺失值，尽管我无法将 NULL 字段添加到第二个查询中以识别缺失
sql - 什么是动态 SQL 查询？何时需要使用动态 SQL 查询？
什么是动态 SQL 查询？何时需要使用动态 SQL 查询？我使用的是 SQL Server 2005。最佳答案这里有几篇文章: Introduction to Dynamic SQL Dynami
php - 在另一个 mysql 查询 while 循环中调用 mysql 查询
include "mysql.php"; $query= "SELECT ID,name,displayname,established,summary,searchlink,im
java - MySQL 查询 "select top 5"查询
我有一个查询要“转换”为 mysql。这是查询: select top 5 * from (select id, firstName, lastName, sum(fileSize) as To
c# - Entity Framework 查询 ToString 不会产生 SQL 查询
通过我的研究，我发现至少从 EF 4.1 开始，EF 查询上的 .ToString() 方法将返回要运行的 SQL。事实上，这对我来说非常有用，使用 Entity Framework 5 和 6。但
MySQL 查询(或 Doctrine 1.2 查询)- 从连接表和过滤器中获取最新项目
我在构造查询来执行以下操作时遇到问题: 按activity_type_id过滤联系人，仅显示最近事件具有所需activity_type_id或为NULL(无事件)的联系人表格结构如下: 一个联系人可
php - 如何在执行另一个 SQL 查询 x 分钟后执行一个 SQL 查询？
如何让我输入数据库的信息在输入数据 5 分钟后自行更新？假设我有一张 table : +--+--+-----+ |id|ip|count| +--+--+-----+ |
database - 如何在 N1QL 查询(Couchbase 查询)中使用 LENGTH() 字符串函数
我正在尝试搜索正好是 4 位数字的 ID，我知道我需要使用 LENGTH() 字符串函数，但找不到如何使用它的示例。我正在尝试以下(和其他变体)但它们不起作用。 SELECT max(car_id)
php - 将 SQL 查询 (+JOIN) 转换为 Symfony Propel 查询
我有一个在 mysql 上运行良好的 sql 查询(查询 + 连接): select sum(pa.price) from user u , purchase pu , pack pa where (

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

postgresql - 如何解释以下 PostgreSQL 查询计划