gpt4 book ai didi

Postgresql 查询计划差异

转载 作者:行者123 更新时间:2023-11-29 11:36:17 27 4
gpt4 key购买 nike

我正在尝试调试在生产环境中运行缓慢但在我的开发机器上运行速度很快的查询。我的开发箱有一个 prod 数据库的快照,它只有几天的历史,所以两个数据库的内容大致相同。

查询是:

select count(*) from big_table where search_column in ('something')

注意事项:

  • big_table 是一个 snapshot materialized view约 3500 万行,每天刷新
  • search_column 有一个 b 树索引。
  • 产品在 ubuntu 上是 9.1
  • dev 在 OS X 上是 9.0

查询计划

解释分析的结果:

产品:

QUERY PLAN                                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1119843.20..1119843.21 rows=1 width=0) (actual time=467388.276..467388.278 rows=1 loops=1)
-> Bitmap Heap Scan on big_table (cost=10432.55..1118804.45 rows=415497 width=0) (actual time=116891.126..466949.331 rows=210053 loops=1)
Recheck Cond: ((search_column)::text = 'something'::text)
-> Bitmap Index Scan on big_table_search_column_index (cost=0.00..10328.68 rows=415497 width=0) (actual time=8467.901..8467.901 rows=337164 loops=1)
Index Cond: ((search_column)::text = 'something'::text)
Total runtime: 467389.534 ms
(6 rows)

开发:

QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=524011.38..524011.39 rows=1 width=0) (actual time=209.852..209.852 rows=1 loops=1)
-> Bitmap Heap Scan on big_table (cost=5131.43..523531.22 rows=192064 width=0) (actual time=33.792..194.730 rows=209551 loops=1)
Recheck Cond: ((search_column)::text = 'something'::text)
-> Bitmap Index Scan on big_table_search_column_index (cost=0.00..5083.42 rows=192064 width=0) (actual time=27.568..27.568 rows=209551 loops=1)
Index Cond: ((search_column)::text = 'something'::text)
Total runtime: 209.938 ms
(6 rows)

而prod和dev这两个查询的实际结果分别是210053行和209551行。

虽然这两个计划的结构相同,但是考虑到每个数据库中此表中的行数大致相同,有什么可能解释上述成本的差异?

膨胀

根据@bma 的建议,这里是对 prod 和 dev 以及相关表/索引的“膨胀”查询的结果:

产品:

current_database | schemaname |            tablename            | tbloat | wastedbytes |                             iname                             | ibloat | wastedibytes 
------------------+------------+---------------------------------+--------+-------------+---------------------------------------------------------------+--------+--------------
my_db | public | big_table | 1.6 | 7965433856 | big_table_search_column_index | 0.1 | 0

开发:

current_database | schemaname |            tablename            | tbloat | wastedbytes |                             iname                             | ibloat | wastedibytes 
------------------+------------+---------------------------------+--------+-------------+---------------------------------------------------------------+--------+--------------
my_db | public | big_table | 0.8 | 0 | big_table_search_column_index | 0.1 | 0

瞧,这里有区别。

我已经运行了 vacuum analyze big_table; 但这似乎与计数查询的运行时间没有任何显着差异。

配置

根据 bma 的建议,SELECT name, current_setting(name), source FROM pg_settings WHERE source NOT IN ('default', 'override'); 的结果:

产品:

            name            |         current_setting          |        source        
----------------------------+----------------------------------+----------------------
application_name | psql | client
DateStyle | ISO, MDY | configuration file
default_text_search_config | pg_catalog.english | configuration file
effective_cache_size | 6GB | configuration file
external_pid_file | /var/run/postgresql/9.1-main.pid | configuration file
listen_addresses | * | configuration file
log_line_prefix | %t | configuration file
log_timezone | localtime | environment variable
max_connections | 100 | configuration file
max_stack_depth | 2MB | environment variable
port | 5432 | configuration file
shared_buffers | 2GB | configuration file
ssl | on | configuration file
TimeZone | localtime | environment variable
unix_socket_directory | /var/run/postgresql | configuration file
(15 rows)

开发:

            name            |     current_setting     |        source        
----------------------------+-------------------------+----------------------
application_name | psql | client
DateStyle | ISO, MDY | configuration file
default_text_search_config | pg_catalog.english | configuration file
effective_cache_size | 4GB | configuration file
lc_messages | en_US | configuration file
lc_monetary | en_US | configuration file
lc_numeric | en_US | configuration file
lc_time | en_US | configuration file
listen_addresses | * | configuration file
log_destination | syslog | configuration file
log_directory | ../var | configuration file
log_filename | postgresql-%Y-%m-%d.log | configuration file
log_line_prefix | %t | configuration file
log_statement | all | configuration file
log_timezone | Australia/Hobart | command line
logging_collector | on | configuration file
maintenance_work_mem | 512MB | configuration file
max_connections | 50 | configuration file
max_stack_depth | 2MB | environment variable
shared_buffers | 2GB | configuration file
ssl | off | configuration file
synchronous_commit | off | configuration file
TimeZone | Australia/Hobart | command line
timezone_abbreviations | Default | command line
work_mem | 100MB | configuration file
(25 rows)

最佳答案

大胆的猜测(评论有点太长了......):可能是由于数据分布,用于刷新 mat View 的查询计划非常不同,导致 mat View 以完全不同的方式填充.

这最终可能会产生类似的位图索引扫描计划,但后者可以方便地访问开发安装中选定的几个磁盘页面,而不是生产中的大量磁盘页面。

如果这条线索对您有意义,您能否也发布用于实际创建/刷新 mat View 的查询计划?如果它们差异很大(成本估算、计划等),请尝试在 mat View 上(可能在 search_column 本身上)创建聚簇索引,看看它是否有任何实质性差异。 (不要忘记在这样做之后进行分析。)

关于Postgresql 查询计划差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19149585/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com