gpt4 book ai didi

postgresql - Airflow psycopg2.OperationalError : FATAL: sorry, 已经有太多客户端

转载 作者:行者123 更新时间:2023-11-29 11:29:35 25 4
gpt4 key购买 nike

我有一个四节点集群式 Airflow 环境,几个月来它一直运行良好。

ec2 实例

  • 服务器 1:Web 服务器、调度程序、Redis 队列、PostgreSQL 数据库
  • 服务器 2:网络服务器
  • 服务器 3: worker
  • 服务器 4: worker

最近我一直在研究一个更复杂的 DAG,与我之前从事的相对较小的任务相比,它有几十个任务。我不确定这是否就是我现在才看到此错误弹出的原因或什么,但我偶尔会收到此错误:

在任务日志下的 Airflow UI 上:

psycopg2.OperationalError: FATAL: sorry, too many clients already

在 Web 服务器上(运行 airflow webserver 的输出)我也遇到了同样的错误:

[2018-07-23 17:43:46 -0400] [8116] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2158, in _wrap_pool_connect
return fn()
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 403, in connect
return _ConnectionFairy._checkout(self)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 788, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 532, in checkout
rec = pool._do_get()
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1193, in _do_get
self._dec_overflow()
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1190, in _do_get
return self._create_connection()
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 350, in _create_connection
return _ConnectionRecord(self)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 477, in __init__
self.__connect(first_connect_check=True)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 671, in __connect
connection = pool._invoke_creator(self)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 106, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 410, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/usr/local/lib64/python3.6/site-packages/psycopg2/__init__.py", line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL: sorry, too many clients already

我可以通过运行 sudo/etc/init.d/postgresql restart 并重新启动 DAG 来解决这个问题,但是在 大约 运行了三遍后,我将开始看到错误再次。

我找不到关于这个问题的关于 Airflow 的任何细节,但来自 other posts I've found such as this one他们说这是因为我的客户端(我猜在这种情况下是 Airflow)正在尝试打开与 PostgreSQL 的连接,而不是配置为处理 PostgreSQL 的连接。我运行此命令发现我的 PostgreSQL 可以接受 100 个连接:

[ec2-user@ip-1-2-3-4 ~]$ sudo su
root@ip-1-2-3-4
[/home/ec2-user]# psql -U postgres
psql (9.2.24)
Type "help" for help.

postgres=# show max_connections;
max_connections
-----------------
100
(1 row)

this solution该帖子说我可以增加我的 PostgreSQL 最大连接数,但我想知道我是否应该在我的 Airflow.cfg 文件中设置一个值,以便我可以将 Airflow 允许的连接大小与我的 PoastgreSQL 最大连接大小相匹配。有谁知道我可以在 Airflow 的什么地方设置这个值?以下是我认为相关的字段:

# The SqlAlchemy pool size is the maximum number of database connections
# in the pool.
sql_alchemy_pool_size = 5

# The SqlAlchemy pool recycle is the number of seconds a connection
# can be idle in the pool before it is invalidated. This config does
# not apply to sqlite.
sql_alchemy_pool_recycle = 3600

# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 32

# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 32

# When not using pools, tasks are run in the "default pool",
# whose size is guided by this config element
non_pooled_task_slot_count = 128

# The maximum number of active DAG runs per DAG
max_active_runs_per_dag = 32

接受任何解决此问题的建议。这与我的 Airflow 配置有关还是我的 PostgreSQL 配置有问题?

此外,因为我正在测试一个新的 DAG,所以我有时会终止正在运行 的任务并重新开始它们。也许这样做会导致某些进程无法正确终止,并且它们会保持对 PostgreSQL 开放的无效连接?

最佳答案

遇到了类似的问题。我将 postgres 中的 max_connections 更改为 10000 并将 Airflow 配置中的 sql_alchemy_pool_size 更改为 1000。现在我可以并行运行数百个任务。

PS:我的机器是32核60GB内存。因此,它承担了负载。

关于postgresql - Airflow psycopg2.OperationalError : FATAL: sorry, 已经有太多客户端,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51487740/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com