gpt4 book ai didi

Postgres seldom deadlock diaglostics(Postgres很少死锁对角论)

转载 作者:bug小助手 更新时间:2023-10-25 20:03:02 25 4
gpt4 key购买 nike



In Postgres running on AWS RDS sometimes (once per 3 month) deadlock happens. Symptoms: application stop normally work, DBLoadCPU increased few times comparing to normal. Application reboot doesn't help, DB reboot solve problem.
In app log at the time when DBLoadCPU begin to increase I see batch update which usually takes few seconds but in this case newer finished. This batch update very simple - transaction with a loop from application few thousand times:

在AWS RDS上运行的Postgres中,有时(每3个月一次)会发生死锁。症状:应用程序停止正常工作,DBLoadCPU比正常增加几倍。应用程序重启没有帮助,数据库重启解决了问题。在应用程序日志中,当DBLoadCPU开始增加时,我看到批量更新,通常需要几秒钟,但在这种情况下,更新完成。此批处理更新非常简单-使用来自应用程序的循环执行几千次:


UPDATE BigTable
SET ColA = value
WHERE PK = some-key;

After that UPDATE table_a ...
Only another update on BigTable I see is procedure, called every 3 seconds with different conditions:

在更新TABLE_A之后。我看到的Bigtable上唯一的另一个更新是Procedure,在不同的条件下每3秒调用一次:


PROCEDURE proc1
...
BEGIN
LOCK TABLE table_a IN EXCLUSIVE MODE;
...
LOOP -- about 10 times
...
UPDATE BigTable
SET ColB = some-value
WHERE some-condition;
...
UPDATE BigTable
SET ColC = ColC + 1
WHERE some-condition;

UPDATE table_a ...
END LOOP;

By application logic batch update should newer update same rows as proc1 in the same time. I see no reason for deadlock here. DB log not anymore available.

根据应用程序逻辑,批处理更新应该更新与pro1相同的行。我看不出有什么理由在这里陷入僵局。数据库日志不再可用。


How to diagnose such a problem? If it happens again will be not much time - need reboot DB soon.

如何诊断这样的问题?如果再次发生这种情况,将不会有太多时间-需要重新启动数据库。


Could you recommend good article or video?

你能推荐好的文章或视频吗?


更多回答

That is definitely not a deadlock. Deadlocks get resolved in the database, and locked processes don't consume CPU. When the problem happens, take some snapshots of pg_stat_activity. It is likely that this is an application problem.

这绝对不是一个僵局。死锁在数据库中得到解决,锁定的进程不会消耗CPU。当问题发生时,拍摄一些pg_stat_active的快照。这很可能是一个应用程序问题。

Thank you @LaurenzAlbe, I updated description and mention exclusive lock. I still suspect deadlock because problem exists until DB reboot. May be lock IN EXCLUSIVE MODE must be replaced to ROW EXCLUSIVE? From batch update seems impossible lock same tables in the same order because connection used in other places of application.

谢谢@LaurenzAlbe,我更新了描述并提到了独家锁。我仍然怀疑死锁,因为在数据库重新启动之前问题一直存在。是否必须将LOCK IN EXCLUSIVE模式替换为行独占?从批处理更新似乎不可能以相同的顺序锁定相同的表,因为连接在应用程序的其他位置使用。

Sorry, but your description is too vague.

对不起,你的描述太含糊了。

优秀答案推荐


  1. EXCLUSIVE MODE is overkill, remove lock or change to ROW EXCLUSIVE.

  2. Replace loop with batch update like:


    UPDATE a 
FROM (VALUES
('id1', 1),
('id2', 2),
...
) AS val(id, status)
WHERE a.id = val.id;

to make it at least 100 time faster and decrease DB load.

使其速度至少提高100倍,并减少数据库负载。


Conclusion


From logs seen - proc1 hung forever and block almost all access to table_a.
Changes 1. and 2. solved a problem.
It is still unclear what blocked proc1 after it acquired the lock, but figuring it out requires a lot of effort to reproduce this situation, which is not practical.

从日志中可以看到-pro1永远挂起,并阻止几乎所有对表a的访问。更改1.和2.解决了一个问题。目前还不清楚是什么阻止了pro1获得锁,但要弄清楚它需要付出很大努力才能重现这种情况,这是不现实的。


更多回答

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com