Postgresql 查询计划差异

转载作者：行者123 更新时间：2023-11-29 11:36:17

27

4

我正在尝试调试在生产环境中运行缓慢但在我的开发机器上运行速度很快的查询。我的开发箱有一个 prod 数据库的快照，它只有几天的历史，所以两个数据库的内容大致相同。

查询是:

select count(*) from big_table where search_column in ('something')

注意事项:

big_table 是一个 snapshot materialized view约 3500 万行，每天刷新
search_column 有一个 b 树索引。
产品在 ubuntu 上是 9.1
dev 在 OS X 上是 9.0

查询计划

解释分析的结果:

产品:

QUERY PLAN                                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1119843.20..1119843.21 rows=1 width=0) (actual time=467388.276..467388.278 rows=1 loops=1)
   ->  Bitmap Heap Scan on big_table  (cost=10432.55..1118804.45 rows=415497 width=0) (actual time=116891.126..466949.331 rows=210053 loops=1)
         Recheck Cond: ((search_column)::text = 'something'::text)
         ->  Bitmap Index Scan on big_table_search_column_index  (cost=0.00..10328.68 rows=415497 width=0) (actual time=8467.901..8467.901 rows=337164 loops=1)
               Index Cond: ((search_column)::text = 'something'::text)
 Total runtime: 467389.534 ms
(6 rows)

开发:

QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=524011.38..524011.39 rows=1 width=0) (actual time=209.852..209.852 rows=1 loops=1)
   ->  Bitmap Heap Scan on big_table  (cost=5131.43..523531.22 rows=192064 width=0) (actual time=33.792..194.730 rows=209551 loops=1)
         Recheck Cond: ((search_column)::text = 'something'::text)
         ->  Bitmap Index Scan on big_table_search_column_index  (cost=0.00..5083.42 rows=192064 width=0) (actual time=27.568..27.568 rows=209551 loops=1)
               Index Cond: ((search_column)::text = 'something'::text)
 Total runtime: 209.938 ms
(6 rows)

而prod和dev这两个查询的实际结果分别是210053行和209551行。

虽然这两个计划的结构相同，但是考虑到每个数据库中此表中的行数大致相同，有什么可能解释上述成本的差异？

膨胀

根据@bma 的建议，这里是对 prod 和 dev 以及相关表/索引的“膨胀”查询的结果:

产品:

current_database | schemaname |            tablename            | tbloat | wastedbytes |                             iname                             | ibloat | wastedibytes 
------------------+------------+---------------------------------+--------+-------------+---------------------------------------------------------------+--------+--------------
my_db            | public     | big_table                       |    1.6 |  7965433856 | big_table_search_column_index                                 |    0.1 |            0

开发:

current_database | schemaname |            tablename            | tbloat | wastedbytes |                             iname                             | ibloat | wastedibytes 
------------------+------------+---------------------------------+--------+-------------+---------------------------------------------------------------+--------+--------------
my_db            | public     | big_table                       |    0.8 |           0 | big_table_search_column_index                                 |    0.1 |            0

瞧，这里有区别。

我已经运行了 vacuum analyze big_table; 但这似乎与计数查询的运行时间没有任何显着差异。

配置

根据 bma 的建议，SELECT name, current_setting(name), source FROM pg_settings WHERE source NOT IN ('default', 'override'); 的结果:

产品:

            name            |         current_setting          |        source        
----------------------------+----------------------------------+----------------------
 application_name           | psql                             | client
 DateStyle                  | ISO, MDY                         | configuration file
 default_text_search_config | pg_catalog.english               | configuration file
 effective_cache_size       | 6GB                              | configuration file
 external_pid_file          | /var/run/postgresql/9.1-main.pid | configuration file
 listen_addresses           | *                                | configuration file
 log_line_prefix            | %t                               | configuration file
 log_timezone               | localtime                        | environment variable
 max_connections            | 100                              | configuration file
 max_stack_depth            | 2MB                              | environment variable
 port                       | 5432                             | configuration file
 shared_buffers             | 2GB                              | configuration file
 ssl                        | on                               | configuration file
 TimeZone                   | localtime                        | environment variable
 unix_socket_directory      | /var/run/postgresql              | configuration file
(15 rows)

开发:

            name            |     current_setting     |        source        
----------------------------+-------------------------+----------------------
 application_name           | psql                    | client
 DateStyle                  | ISO, MDY                | configuration file
 default_text_search_config | pg_catalog.english      | configuration file
 effective_cache_size       | 4GB                     | configuration file
 lc_messages                | en_US                   | configuration file
 lc_monetary                | en_US                   | configuration file
 lc_numeric                 | en_US                   | configuration file
 lc_time                    | en_US                   | configuration file
 listen_addresses           | *                       | configuration file
 log_destination            | syslog                  | configuration file
 log_directory              | ../var                  | configuration file
 log_filename               | postgresql-%Y-%m-%d.log | configuration file
 log_line_prefix            | %t                      | configuration file
 log_statement              | all                     | configuration file
 log_timezone               | Australia/Hobart        | command line
 logging_collector          | on                      | configuration file
 maintenance_work_mem       | 512MB                   | configuration file
 max_connections            | 50                      | configuration file
 max_stack_depth            | 2MB                     | environment variable
 shared_buffers             | 2GB                     | configuration file
 ssl                        | off                     | configuration file
 synchronous_commit         | off                     | configuration file
 TimeZone                   | Australia/Hobart        | command line
 timezone_abbreviations     | Default                 | command line
 work_mem                   | 100MB                   | configuration file
(25 rows)

最佳答案

大胆的猜测(评论有点太长了......):可能是由于数据分布，用于刷新 mat View 的查询计划非常不同，导致 mat View 以完全不同的方式填充.

这最终可能会产生类似的位图索引扫描计划，但后者可以方便地访问开发安装中选定的几个磁盘页面，而不是生产中的大量磁盘页面。

如果这条线索对您有意义，您能否也发布用于实际创建/刷新 mat View 的查询计划？如果它们差异很大(成本估算、计划等)，请尝试在 mat View 上(可能在 search_column 本身上)创建聚簇索引，看看它是否有任何实质性差异。 (不要忘记在这样做之后进行分析。)

关于Postgresql 查询计划差异，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19149585/

27

4

0

文章推荐： macos - mac 上用户 "postgres"的密码验证失败

文章推荐： function - Postgresql 将选择查询分配给函数中的变量

文章推荐： postgresql - 哪种 pgdump 格式最适合小存储和快速恢复？

计划 - eq？比较2个字符串？
我的程序有问题。我有一个比较两个字符串的条件: (if (eq? (exp1) (exp2))) 当 exp1 给我一个字符串，exp2 给我一个字符串。可以肯定的是，当我更改“eq？”时到“=”，
GWT future 计划
我们有多种主要使用 GWT 开发的产品，目前由我们的最终客户使用。想知道 GWT 的路线图。我得到了一些非官方的更新，谷歌正在将 GWT 中开发的产品转移到其他一些新技术。这是真的吗？ GWT 的长
Jenkins 定期构建 - 计划
我希望每 15 分钟定期构建一次。我在网上看过，我正在使用这个时间表:*/15 * * * * Jenkins 告诉我使用 H/15 * * * * 来平均分配负载而不是 */15 * * * * 有
计划，电话/抄送
所以我正试图在 Scheme 中找出整个 call/cc 的东西。下面是我正在使用的代码: (+ 1 (call/cc (lambda (k) (if (number? k)
计划，电话/抄送
所以我正试图在 Scheme 中找出整个 call/cc 的东西。下面是我正在使用的代码: (+ 1 (call/cc (lambda (k) (if (number? k)
Azure 计划 Web 作业有时会触发两次
我们有一个 Azure WebJob，计划在 UTC 每天上午 8:00 运行(CRON - 0 00 08 * * *)。大多数时候它都会正确触发，但有时会触发两次(第二次运行)第一次运行后约 10
terraform - 命令行参数过多 Terraform 计划
我是 Terraform 的新手。我正在尝试通过 azure 管道创建一个简单的存储帐户，但是当我运行我的管道时，我收到错误“太多命令行参数”。我很震惊，我不知道自己做错了什么。有人可以帮忙吗。这是
ballerina - 如何终止芭蕾舞 Actor 计划
我想在某些逻辑中间停止芭蕾舞 Actor 程序。如何使用代码停止 ballerina 中正在运行的程序？我正在寻找相当于 java 中的 System.exit(0) 的东西。最佳答案我相信您正在
agile - 与多层团队一起进行 Scrum 计划
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。想改进这个问题？将问题更新为 on-topic对于堆栈溢出。 8年前关闭。 Improve this qu
Azure 计划 Web 作业有时会触发两次
我们有一个 Azure WebJob，计划在 UTC 每天上午 8:00 运行(CRON - 0 00 08 * * *)。大多数时候它都会正确触发，但有时会触发两次(第二次运行)第一次运行后约 10
terraform - 命令行参数过多 Terraform 计划
我是 Terraform 的新手。我正在尝试通过 azure 管道创建一个简单的存储帐户，但是当我运行我的管道时，我收到错误“太多命令行参数”。我很震惊，我不知道自己做错了什么。有人可以帮忙吗。这是
syntax - 为什么这是合法的( Racket )计划？
我正在浏览 htdp 并在一开始的某个地方发现了这个:- Explain why the following sentences are illegal definitions: 1. (define
php - 处理续订月份的成员(member)计划
我正在使用 Laravel 开发成员(member)门户。成员(member)资格有不同的类别，例如1) 单人2) 成人3) 家庭以及不同价格的所有类型。我有一个 plans 表和 plans_s
php - 正在执行 MySQL 计划？
我使用 DreamHost 作为我的网站的服务器，并且我尝试每天、每周和每月执行某个 MySQL 查询来更改我的网站的数据库。我开始在本地主机上使用事件调度程序，然后我发现我无法在 DreamHost
bash - 每两周一次的 Crontab 计划
这周我的 crontab 作业发生了一个问题。设置如下，每两周正常运行一次，直到现在。 10 06 * * 1 test $(($(date +\%W)\%2)) -eq 0 && echo 'te
linux - 计划 cron 作业错误
编写了一个简单的脚本，它将在日志文件中写入日期时间戳，并且每次运行该脚本时，它都会附加到该日志文件中。 #!/bin/sh echo $(date) >> log.txt 当我尝试每 1 分钟安排一次
c - 不了解 PIPE 计划
我对 PIPE 的了解是它用于单向通信，它有助于在两个相关进程之间进行通信。我从一本书中得到了下面的 PIPE 编程代码示例。我正在尝试使用 printf 理解代码并在代码的每一行之后打印出所有点。但
计划， Racket 帮助。想要返回符合条件的数字
代码如下: (define make-simple-sv-num (lambda (delare) (let ((tal (random-from-to 100000 1000000)))
math - ( Racket /计划)减法产生的结果非常小
我目前正在使用“How To Design Programs”——使用 Scheme/Racket；我在 Scheme 的 R5RS 版本中遇到了一个非常奇特的功能。在进行简单的减法时，尽管使用的是
ruby - 确定 ice_cube 计划
我想确定时间表的详细信息。例如: 我有一个事件的时间表:event.schedule "Every 3 months on the 10th day of the month" 由哈希表示: {

首页

博学

6Ren·AI

商城

Postgresql 查询计划差异

查询计划

膨胀

配置