postgresql - Postgres:为什么选择count(*)需要这么长时间-6ren

postgresql - Postgres:为什么选择count(*)需要这么长时间

转载作者：行者123 更新时间：2023-11-29 12:15:10

TL；博士
我有一个问题表，查询速度很慢。我在上面运行pg_repack来重建表，但仍然很慢。不幸的是，pg_repack没有重建表。我不得不通过pg_dump转储并重新加载表。
分析显示有很多死行。

# analyse verbose payslip;
INFO:  analyzing "public.payslip"
INFO:  "payslip": scanned 30000 of 458337 pages, containing 8732 live rows and 400621 dead rows; 8732 rows in sample, 133407 estimated total rows
ANALYZE

自动吸尘器不工作。这篇文章指出了潜在的问题。。。
https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/
原始线程
我有一张140k行的桌子，每周大约增长500行。
几周前，我调查了表上的查询，发现所有查询都很慢。例如，select count（）耗时6秒。我用pg_repack重新构建了这个表，并假设它已经结束了。我注意到今天的表又慢了，select count（）需要3秒。
数据库中有138个表，只有另一个表（有130万行）需要一秒钟以上的时间来执行select count（*）。
我想知道是否存在损坏，这是否是Postgres中的一个bug，或者是否存在优化问题。
问询处
以下是通过psql（今天）进行的计数

# select count(*) from payslip;
 count  
--------
 140327
(1 row)

Time: 3255.772 ms (00:03.256)

这是查询计划

# explain select count(*) from payslip;
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Aggregate  (cost=142820.48..142820.49 rows=1 width=8)
   ->  Bitmap Heap Scan on payslip  (cost=22543.92..142479.77 rows=136285 width=0)
         ->  Bitmap Index Scan on payslip_idx3  (cost=0.00..22509.84 rows=136285 width=0)
(3 rows)

这是数据模型（已截断）。

                         Table "public.payslip"
          Column          |          Type          | Collation | Nullable |                   Default                    
--------------------------+------------------------+-----------+----------+----------------------------------------------
 taxregno                 | character varying(20)  |           | not null | 
 worksid                  | character varying(8)   |           | not null | 
 cutoffdate               | character(10)          |           | not null | 
 productionid             | integer                |           | not null | 
... 

Ignore 50 columns

Indexes:
    "payslip_pkey" PRIMARY KEY, btree (taxregno, worksid, cutoffdate, productionid)
    "payslip_k1" UNIQUE, btree (taxregno, worksid, cutoffdate, productionid)
    "payslip_idx3" btree (worksid)
    "payslip_idx4" btree (ppsnumber)

Postgres版本目前是11。这个数据库库从Postgres8迁移到当前版本超过了10年。我只是按照各种Ubuntu升级的说明操作。

$ psql -V
psql (PostgreSQL) 11.3 (Ubuntu 11.3-1.pgdg14.04+1)

服务器运行在带有SSD存储的Linode linux机器上。我将postgresql.conf页面成本设置为反映SSD。

#seq_page_cost = 1.0            # measured on an arbitrary scale
random_page_cost = 1.0          # same scale as above

今天
不幸的是，这是一个生产服务器，我需要解决性能问题短期内。因此，我现在再次运行pg_repack。
包装后重新包装

# select count(*) from payslip;
 count  
--------
 140327
(1 row)

Time: 26.216 ms

# explain select count(*) from payslip;
                              QUERY PLAN                              
----------------------------------------------------------------------
 Aggregate  (cost=10974.09..10974.10 rows=1 width=8)
   ->  Seq Scan on payslip  (cost=0.00..10623.27 rows=140327 width=0)
(2 rows)

根据一匹没有名字的马的要求，以下是进一步的信息。如上所述，这与重建后的表背道而驰。

# explain (analyze, buffers, timing) select count(*) from payslip;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12850.75..12850.76 rows=1 width=8) (actual time=42.070..42.071 rows=1 loops=1)
   Buffers: shared hit=11022
   ->  Seq Scan on payslip  (cost=0.00..12485.00 rows=146300 width=0) (actual time=0.010..31.669 rows=140327 loops=1)
         Buffers: shared hit=11022
 Planning Time: 0.102 ms
 Execution Time: 42.115 ms
(6 rows)

一周后更新。
这一周很平静。这张桌子增加了250行。选择count（*）slowed from.04 seconds to.7 seconds。查询从较快的顺序扫描改回较慢的位图索引扫描。

select count(*) from payslip;
 140572

Time: 643.144 ms

这是细节。

explain (analyze, buffers, timing) select count(*) from payslip;
 Aggregate  (cost=108251.57..108251.58 rows=1 width=8) (actual time=718.015..718.016 rows=1 loops=1)
   Buffers: shared hit=169407
   ->  Bitmap Heap Scan on payslip  (cost=8522.42..107900.14 rows=140572 width=0) (actual time=229.612..707.319 rows=140572 loops=1)
         Heap Blocks: exact=76839 lossy=84802
         Buffers: shared hit=169407
         ->  Bitmap Index Scan on payslip_idx3  (cost=0.00..8487.28 rows=140572 width=0) (actual time=205.228..205.228 rows=2212168 loops=1)
               Buffers: shared hit=7757
 Planning Time: 0.115 ms
 Execution Time: 718.069 ms

两周后更新
我重建桌子已经两周了。这周桌子增加了340行。选择计数（*）时间从.6秒减少到2秒。

select count(*) from payslip;
 count  
--------
 140914
(1 row)

Time: 2077.577 ms (00:02.078)

查询计划没有变化，执行速度慢得多。

explain (analyze, buffers, timing) select count(*) from payslip;
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=138089.00..138089.01 rows=1 width=8) (actual time=2068.305..2068.305 rows=1 loops=1)
   Buffers: shared hit=8 read=324086 written=1
   ->  Bitmap Heap Scan on payslip  (cost=17071.92..137736.72 rows=140914 width=0) (actual time=270.512..2056.755 rows=140914 loops=1)
         Heap Blocks: exact=8198 lossy=301091
         Buffers: shared hit=8 read=324086 written=1
         ->  Bitmap Index Scan on payslip_idx3  (cost=0.00..17036.69 rows=140914 width=0) (actual time=268.801..268.801 rows=4223367 loops=1)
               Buffers: shared read=14794
 Planning Time: 0.164 ms
 Execution Time: 2068.623 ms
(9 rows)

Time: 2069.567 ms (00:02.070)

选择的索引（idx3）是一个重复索引，140k条记录中有22k个唯一值。位图索引扫描表明，本周扫描了400万行（在400次插入之后），上周扫描了200万行进行相同的查询，这与性能下降是一致的。
来自索引维护查询的信息（由richyen建议）

 relname | rows_in_bytes | num_rows | number_of_indexes | unique | single_column | multi_column 
---------+---------------+----------+-------------------+--------+---------------+--------------
 payslip | 138 kB        |   140914 |                 4 | Y      |             2 |            2



 schemaname | tablename |  indexname   | num_rows | table_size | index_size | unique | number_of_scans | tuples_read | tuples_fetched 
------------+-----------+--------------+----------+------------+------------+--------+-----------------+-------------+----------------
 public     | payslip   | payslip_k1   |   140914 | 2420 MB    | 244 MB     | Y      |           39720 |  3292501603 |       14295183
 public     | payslip   | payslip_idx4 |   140914 | 2420 MB    | 156 MB     | N      |           43013 |  9529447977 |       34943724
 public     | payslip   | payslip_idx3 |   140914 | 2420 MB    | 116 MB     | N      |           42812 |  3067603558 |       72358879
 public     | payslip   | payslip_pkey |   140914 | 2420 MB    | 244 MB     | Y      |            3540 |   203676311 |        4213496
(4 rows)


  size   |             idx1             |              idx2               |         idx3         | idx4 
---------+------------------------------+---------------------------------+----------------------+------
 488 MB  | payslip_pkey                 | payslip_k1                      |                      |

在这个阶段，我重新设计了表索引。我将主键设置为序列的整数值，并将序列号包含在所有索引中以使它们唯一。
由于索引已重新生成，选择计数（*）已返回到执行顺序扫描。我将不得不等待表稍微增长一点，看看查询是否使数百万行读取。

explain (analyze, buffers, timing) select count(*) from payslip;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=312850.42..312850.43 rows=1 width=8) (actual time=1348.241..1348.242 rows=1 loops=1)
   Buffers: shared hit=199941 read=111148
   ->  Seq Scan on payslip  (cost=0.00..312498.14 rows=140914 width=0) (actual time=209.227..1336.035 rows=140914 loops=1)
         Buffers: shared hit=199941 read=111148
 Planning Time: 0.069 ms
 Execution Time: 1348.289 ms
(6 rows)

索引信息现在是

 schemaname | tablename |  indexname   | num_rows | table_size | index_size | unique | number_of_scans | tuples_read | tuples_fetched 
------------+-----------+--------------+----------+------------+------------+--------+-----------------+-------------+----------------
 public     | payslip   | payslip_pkey |   140914 | 2430 MB    | 91 MB      | Y      |               0 |           0 |              0
 public     | payslip   | payslip_idx2 |   140914 | 2430 MB    | 202 MB     | Y      |               0 |           0 |              0
 public     | payslip   | payslip_idx4 |   140914 | 2430 MB    | 128 MB     | Y      |               0 |           0 |              0
 public     | payslip   | payslip_idx3 |   140914 | 2430 MB    | 128 MB     | N      |               0 |           0 |              0
(4 rows)

问题解决了
我终于想出了解决办法。我的问题是，我假设pg_repack按照建议的名称重建了表。没有。桌子完全碎了。
出于某种原因，我不知道为什么，对于零碎的表，postgresql决定进行顺序扫描而不是索引扫描。
这就是我应该看的。

# analyse verbose payslip;
INFO:  analyzing "public.payslip"
INFO:  "payslip": scanned 30000 of 458337 pages, containing 8732 live rows and 400621 dead rows; 8732 rows in sample, 133407 estimated total rows
ANALYZE

使用pg_转储并重新加载表，很快解决了问题。
我进一步调查了这个问题，发现了这篇优秀的文章。
https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/
数据库中有两个已准备就绪的事务阻止了autovacuum正常工作。
选择gid、prepared、owner、database、transaction作为xmin
-#从pg_prepared_xacts
-#按年龄（交易）说明排序；
gid |准备|所有者|数据库| xmin
--------------------------------------+-------------------------------+-------+----------+---------
_萨昂4f7780bb6653ccb70ddaf2143ac7a232 | 2019-08-12 13:00:11.738766+01 |凯文|凯文| 1141263
_sa|u 0DB277AEBCB44884763FE6245D702FE | 2019-09-19 14:00:11.977378+01 | kevin | kevin | 2830229
（2行）
感谢大家的帮助。

最佳答案

从上周到本周的变化表明payslip的许多数据不再在缓存中（请参阅hit和read部分中的变化）。
还要注意的是，您的Buffers:越来越Heap Blocks，这意味着您的lossy设置可能太低，无法进行操作。
本周您可能应该将work_mem至少增加到work_mem（因为最新的统计数据表明访问了约309k个页面块）。但是，您可能需要随着表的增长而增加它--25MB可以按会话设置，因此您需要根据基于模式的表大小预测来设置它（我不喜欢这个想法，但也不建议将work_mem设置为任意高，因为全局设置可能导致内存过度分配）
我不太清楚work_mem的内部结构，但我想知道您是否看到了重新打包后性能的提高，因为这些东西存储在内存中，并随着时间的推移被清除掉。
披露：我为EnterpriseDB (EDB)工作

关于postgresql - Postgres:为什么选择count(*)需要这么长时间，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58449716/

文章推荐： postgresql - 如果超过 6 列，Azure 数据工厂复制管道将失败

文章推荐： postgresql - 使用带有转义字符的 ilike any()

c - gstreamer 需要 g_main_loop_run 而 gtk 需要 gtk_main()
我正在尝试用 C 语言编写一个使用 gstreamer 的 GTK+ 应用程序。 GTK+ 需要 gtk_main() 来执行。 gstreamer 需要 g_main_loop_run() 来执行。
python - 为什么 opencv3 需要 libavcodec56 而 opencv2 需要 libavcodec57
我已经使用 apt-get 安装了 opencv。我得到了以下版本的opencv2，它工作正常: rover@rover_pi:/usr/lib/arm-linux-gnueabihf $ pytho
ios - UIScrollView - 需要 x 位置/宽度的约束，需要 y 位置/高度的约束
我有一个看起来像这样的 View 层次结构(基于其他答案和 Apple 的使用 UIScrollView 的高级 AutoLayout 指南): ScrollView 所需的2 个步骤是: 为 Scr
Linux glib 需要 pkg-config 而 pkg-config 需要 glib？
我尝试安装 udev。 udev 在 ./configure 期间给我一个错误 --exists: command not found configure: error: pkg-config and
sql - 为什么我选择 1 需要 40 毫秒，而选择 150 需要 500 秒？
我正在使用 SQLite 3。我有一个表，forums，有 150 行，还有一个表，posts，有大约 440 万行。每个帖子都属于一个论坛。我想从每个论坛中选择最新帖子的时间戳。如果我使用 SEL
Golang jsonapi 需要 string 或 int 但 mongo 需要 bson.ObjectId
使用 go 和以下包: github.com/julienschmidt/httprouter github.com/shwoodard/jsonapi gopkg.in/mgo.v2/bson
sql-server - 同样的 SQL 请求，CockroachDB 需要 4min SQL Server 需要 35ms。我错过了什么吗？
The database仅包含 2 个表: 钱包(100 万行) 事务(1500 万行) CockroachDB 19.2.6 在 3 台 Ubuntu 机器上运行每个 2vCPU 每个 8GB R
c++ - std::iter_swap 需要 ValueSwappable args vs std::swap 需要 Move Assignable args
我很难理解为什么在下面的代码中直接调用 std::swap() 会导致编译错误，而使用 std::iter_swap 编译却没有任何错误. 来自 iter_swap() versus swap() -
oracle - SELECT 需要 100 毫秒； CREATE table as select - 或 - INSERT into select 需要 15 分钟
我有一个非常简单的 SELECT *用 WHERE NOT EXISTS 查询条款。 SELECT * FROM "BMAN_TP3"."TT_SPLDR_55E63A28_59358" SELECT
css - Sass 循环 @import，a.scss 需要 b.scss 上的类，b.scss 需要 a.scss 上的类
我试图按部分组织我的 .css 文件，我需要从任何文件访问文件组中的任何类。在 Less 中，我可以毫无问题地创建一个包含所有文件导入的主文件，并且每个文件都导入主文件，但在 Sass 中，我收到一个
redis - Microsoft.AspNet.SignalR.Redis 需要 StackExchange.Redis.StrongName，但是 StackExchange.Redis.Extensions.Core 需要 StackExchange.Redis
Microsoft.AspNet.SignalR.Redis 和 StackExchange.Redis.Extensions.Core 在同一个项目中使用。前者需要StackExchange.Red
ruby-on-rails - sass-rails 需要 sprockets 2.0.0 但 rails 4.1.0 需要 sprockets 2.12.1
这个问题在这里已经有了答案: Updating from Rails 4.0 to 4.1 gives sass-rails railties version conflicts (4 个答案) 关
需要 Azure 发布管道身份验证
我们有一些使用 Azure DevOps 发布管道部署到的现场服务器。我们已经使用这些发布管道几个月了，没有出现任何问题。今天，我们在下载该项目的工件时开始出现身份验证错误。部署组中的节点显示在线，
需要 Firebase 索引但未提供链接
Tip: instead of creating indexes here, run queries in your code – if you're missing any indexes, you
需要 Elm 语法帮助
你能解释一下 Elm 下一个声明中的意思吗？ (=>) = (,) 我在 Elm architecture tutorial 的例子中找到了它最佳答案这是中缀符号。实际上，这定义了一个函数 (=>
需要 .NET 程序集查看器
我需要一个 .NET 程序集查看器，它可以显示低级详细信息，例如元数据表内容等。最佳答案 ildasm 是 IL 反汇编程序，具有低级托管元数据 token 信息。安装 Visual Studio
需要 VBA 循环逻辑
我有两个列表要在 Excel 中进行比较。这是一个很长的列表，我需要一个 excel 函数或 vba 代码来执行此操作。我已经没有想法了，因此转向你: **Old List** A
.net - 需要.NET库以将TIFF文件转换为PDF
Closed. This question does not meet Stack Overflow guidelines。它当前不接受答案。想要改善这个问题吗？更新问题，以便将其作为on-topi
需要 XML 命名空间吗？
我正在学习 xml 和 xml 处理。我无法很好地理解命名空间的存在。我了解到命名空间帮助我们在 xml 中分离相同命名的元素。我们不能通过具有相同名称的属性来区分元素吗？为什么命名空间很重要或需要
需要 Azure 端口吗？
我搜索了 Azure 文档、各种社区论坛和 google，但没有找到关于需要在公司防火墙上打开哪些端口以允许 Azure 所有组件(blob、sql、compute、bus、publish)的简洁声明

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

postgresql - Postgres:为什么选择count(*)需要这么长时间