Postgresql 9.4 级联复制故障转移

转载作者：行者123 更新时间：2023-11-29 13:17:47

25

4

环境:

Ubuntu14.04 + Postgresql9.4.

以下是我的设置:('->'表示物理流复制PSR)

Master1 -> Slave1 (primary) -> Slave2

此行为正确 - Master1 的更改反射(reflect)在 Slave1，然后是 Slave2。

如果我禁用 Master1，并使用 trigger_file 将 Slave1 提升为 Master，则 Slave1 会成功提升 - 我可以写入 Slave1。

但是，新提升的 Slave1 和 Slave2 之间的复制停止。

这是预期的行为吗？我原以为复制会像这样继续:

Slave1 -> Slave2

这样对 Slave1 的写入反射(reflect)在 Slave2 中

更新

日志:

Slave1 提升:

2017-10-03 16:43:20 BST  @ LOCATION:  libpqrcv_connect, libpqwalreceiver.c:107
2017-10-03 16:43:25 BST  @ FATAL:  XX000: could not connect to the primary server: could not connect to server: Connection refused
        Is the server running on host "192.168.20.55" and accepting
        TCP/IP connections on port 5432?

2017-10-03 16:43:25 BST  @ LOCATION:  libpqrcv_connect, libpqwalreceiver.c:107
2017-10-03 16:43:30 BST  @ LOG:  00000: trigger file found: /var/lib/postgresql/9.4/main/failover_trigger.5432
2017-10-03 16:43:30 BST  @ LOCATION:  CheckForStandbyTrigger, xlog.c:11440
2017-10-03 16:43:30 BST  @ LOG:  00000: redo done at 0/19000740
2017-10-03 16:43:30 BST  @ LOCATION:  StartupXLOG, xlog.c:7032
2017-10-03 16:43:30 BST  @ LOG:  00000: last completed transaction was at log time 2017-10-03 16:41:23.430752+01
2017-10-03 16:43:30 BST  @ LOCATION:  StartupXLOG, xlog.c:7037
2017-10-03 16:43:30 BST  @ LOG:  00000: selected new timeline ID: 2
2017-10-03 16:43:30 BST  @ LOCATION:  StartupXLOG, xlog.c:7153
2017-10-03 16:43:30 BST  @ LOG:  00000: archive recovery complete
2017-10-03 16:43:30 BST  @ LOCATION:  exitArchiveRecovery, xlog.c:5459
2017-10-03 16:43:30 BST  @ LOG:  00000: MultiXact member wraparound protections are now enabled
2017-10-03 16:43:30 BST  @ LOCATION:  DetermineSafeOldestOffset, multixact.c:2619
2017-10-03 16:43:30 BST  @ LOG:  00000: database system is ready to accept connections
2017-10-03 16:43:30 BST  @ LOCATION:  reaper, postmaster.c:2795
2017-10-03 16:43:30 BST  @ LOG:  00000: autovacuum launcher started
2017-10-03 16:43:30 BST  @ LOCATION:  AutoVacLauncherMain, autovacuum.c:431

奴隶2

2017-10-03 16:43:30 BST  @ LOG:  00000: replication terminated by primary server
2017-10-03 16:43:30 BST  @ DETAIL:  End of WAL reached on timeline 1 at 0/190007A8.
2017-10-03 16:43:30 BST  @ LOCATION:  WalReceiverMain, walreceiver.c:446
2017-10-03 16:43:30 BST  @ LOG:  00000: fetching timeline history file for timeline 2 from primary server
2017-10-03 16:43:30 BST  @ LOCATION:  WalRcvFetchTimeLineHistoryFiles, walreceiver.c:669
2017-10-03 16:43:30 BST  @ LOG:  00000: record with zero length at 0/190007A8
2017-10-03 16:43:30 BST  @ LOCATION:  ReadRecord, xlog.c:4184
2017-10-03 16:43:30 BST  @ LOG:  00000: restarted WAL streaming at 0/19000000 on timeline 1
2017-10-03 16:43:30 BST  @ LOCATION:  WalReceiverMain, walreceiver.c:374
2017-10-03 16:43:30 BST  @ LOG:  00000: replication terminated by primary server
2017-10-03 16:43:30 BST  @ DETAIL:  End of WAL reached on timeline 1 at 0/190007A8.

从机1 IP:

192.168.20.56

从机2 IP:

192.168.20.53

pg_hba.conf 允许 Slave2 连接到 Slave1 进行复制:

Slave1 pg_hba.conf段:

host    replication     replication     192.168.20.53/32        trust

Slave1 recovery.done:

standby_mode = 'on'
primary_conninfo = 'user=replication host=192.168.20.55 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
trigger_file = '/var/lib/postgresql/9.4/main/failover_trigger.5432'

Slave2 recovery.conf:

standby_mode = 'on'
primary_conninfo = 'user=replication host=192.168.20.56 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'

非常感谢任何帮助。

更新及解决方案

感谢 @Vao Tsun 的回答，在 Slave2 recovery.conf 中添加 recovery_target_timeline 设置为“最新”，并重新启动 Slave2 postgresql 服务器(不是重新加载)允许复制过程重新启动:

standby_mode = 'on'
primary_conninfo = 'user=replication host=192.168.20.56 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
recovery_target_timeline = 'latest'

最佳答案

你在slave1的日志中看到:

2017-10-03 16:43:30 BST  @ LOG:  00000: selected new timeline ID: 2

在slave2中:

017-10-03 16:43:30 BST  @ DETAIL:  End of WAL reached on timeline 1 at 0/190007A8.

所以slave2在晋升后并没有切换到时间线二。

正如我在评论中所说，您需要在 slave2 recovery.conf 中使用 recovery_target_timeline='latest'

https://www.postgresql.org/docs/current/static/recovery-target-settings.html

recovery_target_timeline (string) Specifies recovering into a particular timeline. The default is to recover along the same timeline that was current when the base backup was taken. Setting this to latest recovers to the latest timeline found in the archive, which is useful in a standby server. Other than that you only need to set this parameter in complex re-recovery situations, where you need to return to a state that itself was reached after a point-in-time recovery. See Section 25.3.5 for discussion.

关于Postgresql 9.4 级联复制故障转移，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46529323/

25

4

0

文章推荐： sql - 相交 N width_buckets

文章推荐： php - 如何将数据从服务器中的文本文件导入到mysql数据库

文章推荐： postgresql - 导入包含转义定界符的字段值的定界文件

JavaScript 故障
有人可以解释一下为什么这个脚本不起作用吗？ function destroy(ID) { if (confirm("Deleting is a very bad thing! Sure?")
wcf - Silverlight 故障
我正在尝试使 WCF Silverlight 故障按此方式工作: MSDN aricle 将 SL 故障添加到我的 Web.config 文件后，我收到以下警告: The element 'behav
Haskell mod 故障？
这是我要删除的 Haskell 函数 2::Int和 5::Int从列表中: remPrimesFactors25 :: [Int] -> [Int] remPrimesFactors25 [] =
FFmpeg DTS 故障
当我想用 ffmpeg 连接和录制两个 mp4 视频时，我遇到了这个问题。我得到的输出是: [concat @ 0x2566e80] DTS 4079 #0:0 (h264 (native) ->
delphi - SetCursorPos 故障？
我想在delphi中编写一个程序来模拟以特定速度移动的鼠标指针(类似于AutoIT MouseMove函数)。要么是我的代码错误，要么是 SetCursorPos 在被调用太多次后出现故障。这是我的功
JavaScript 故障，无法正确重定向
我将“wa、or 和 id”(来自这些州的访问者)设置为重定向到 website1.com - 当我访问该网站时，它会将我重定向到 website1.com(因此它知道我在 WA) 。但如果我将 wa
WCF - 故障/异常与消息
我们目前正在争论通过 WCF channel 抛出错误与传递指示状态或服务响应的消息是否更好。故障带有 WCF 的内置支持，您可以使用内置的错误处理程序并做出相应的 react 。然而，这会带来开销
r - c() 故障？
不确定我在这里做错了什么，如果有任何帮助，我们将不胜感激。尝试创建一个名为“control”的新变量，并在行变量等于这些数字时将其编码为 1，否则编码为 0。 data$control= ifels
.net - 遥测采样而不影响错误/故障
我想在应用洞察中记录成功调用的百分比。我看到这篇文章https://learn.microsoft.com/en-us/azure/azure-monitor/app/sampling我认为固定速率采
python - 如何区分Python或Matlab是否错误/故障？
我正在尝试使用 SVD 和特征分解来使用动态模式分解进行一些数据分析。我遇到了一个简单的问题，即从 Matlab 和 Python 获得不同的结果。我很困惑，不知道为什么 Python 给我错误的结果
php - 我迫切需要帮助来排除mysqli_error()故障。
This question already has an answer here: mysqli_fetch_assoc() expects parameter / Call to a member
c - 结构链接表分段-故障
我刚刚开始我的一个实验室，在那里我计算类(class)的 GPA，其信息存储在结构的链接列表中。截至目前，我正在尝试打印所有类(class)信息，以确保它们已正确初始化并添加到链接列表中。我遇到了一
c++ - GetWindowText 故障
我正在尝试学习如何使用 visual studio 为 C++ 制作 GUI。但是我在使用 GetWindowText() 函数时遇到了一些问题。它不会将 LPTSTR 标题更改为文本框中的文本，并且
ios - NSNumberFormatter 故障
我有一个奇怪的问题。它似乎只出现在测试者的 iPhone 5s 上。它可以在运行最新 iOS (8.3) 的 iPhone 5、6 和 6 plus 上正常运行。这是代码 -(NSString *)
ios - 更新核心数据记录<故障>
我正在尝试更新 Core Data 中的一些记录。我正在采取以下步骤来完成它带谓词的获取函数从核心数据中检索记录将结果集存储在对象数组中遍历数组并更新每条记录调用保存上下文我遇到了两个问题
iphone - viewWithTag 故障。
我通过 Storyboard设计了 tableView，在一个单元格中我有一个按钮和一个标签。按钮在 Storyboard上有标签 1 和标签在 Storyboard上有标签 2。在 cellForR
ios - textFieldShouldEndEditing 故障？
我实现了这个方法，当在文本字段中输入了未经授权的字符或已使用的用户名时，向用户发送多个警报 View : func textFieldShouldEndEditing(textField: UITex
C++ Rnd() 故障
伙计们，我在运行程序时遇到了这个非常奇怪的错误。这是重要的代码: 变量(编辑): const short int maxX = 100; const short int maxZ = 100; con
JavaScript 错误/故障？
我有这个修改过的 Matrix Javascript 代码，我想摆脱第一次运行的所有与自身重叠的字符串。有人知道我该如何管理吗？另外，我想在我的网页上多次使用此代码，我需要声明新变量，不是吗？但是当我
c# - COMException 故障
有谁知道是否有网站(甚至非 Microsoft)有关于 COMExceptions/HRESULTS 的详细信息。当我尝试在使用 Copy() 函数后保存我的 Excel 工作簿时，我收到此错误:

首页

博学

6Ren·AI

商城