mysql - MySQL 5.7 innoDB集群中的一个节点崩溃并且无法将崩溃的节点重新加入集群-6ren

mysql - MySQL 5.7 innoDB集群中的一个节点崩溃并且无法将崩溃的节点重新加入集群

转载作者：行者123 更新时间：2023-11-29 06:33:38

我们的一个环境中有一个 MySQL innodb 集群。集群中的一个节点崩溃了。虽然我们能够使崩溃的节点联机，但无法将其加入集群。

有人可以帮助恢复/恢复节点并将其加入集群吗？我们尝试使用“dba.rebootClusterFromCompleteOutage()”，但没有帮助。

配置:MySQL 5.7.24社区版，CentOS 7，标准三节点innodb集群

集群状态:

MySQL  NODE02:3306 ssl  JS > var c=dba.getCluster()
MySQL  NODE02:3306 ssl  JS > c.status()
{
    "clusterName": "QACluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "NODE03:3306",
        "ssl": "REQUIRED",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
        "topology": {
            "NODE02:3306": {
                "address": "NODE02:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "NODE03:3306": {
                "address": "NODE03:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "NODE01:3306": {
                "address": "NODE01:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "(MISSING)"
            }
        }
    },
    "groupInformationSourceMember": "mysql://clusterAdmin@NODE03:3306"
}

mysql错误日志中记录的错误:

2019-03-04T23:49:36.970839Z 3624 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log './NODE01-relay-bin-group_replication_recovery.000001' position: 4
2019-03-04T23:49:36.985336Z 3623 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'mysql_innodb_cluster_r0429584112@NODE02:3306',replication started in log 'FIRST' at position 4
2019-03-04T23:49:36.988164Z 3623 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2019-03-04T23:49:36.988213Z 3623 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
2019-03-04T23:49:36.988226Z 3623 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
2019-03-04T23:49:36.988286Z 41 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2019-03-04T23:49:36.988358Z 3624 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
2019-03-04T23:49:36.988435Z 3624 [Note] Slave SQL thread for channel 'group_replication_recovery' exiting, replication stopped in log 'FIRST' at position 0
2019-03-04T23:49:37.016864Z 41 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='NODE02', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2019-03-04T23:49:37.030769Z 41 [ERROR] Plugin group_replication reported: 'Maximum number of retries when trying to connect to a donor reached. Aborting group replication recovery.'
2019-03-04T23:49:37.030798Z 41 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2019-03-04T23:49:37.051169Z 41 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2019-03-04T23:49:37.069184Z 41 [ERROR] Plugin group_replication reported: 'Fatal error during the Recovery process of Group Replication. The server will leave the group.'
2019-03-04T23:49:37.069304Z 41 [Note] Plugin group_replication reported: 'Going to wait for view modification'
2019-03-04T23:49:40.336938Z 0 [Note] Plugin group_replication reported: 'Group membership changed: This member has left the group.'

最佳答案

我执行了以下操作来从备份中恢复故障节点并能够恢复集群状态。

1)以下是其中一个节点发生故障(NODE01)时集群的状态。

 MySQL  NODE02:3306 ssl  JS > var c=dba.getCluster()
 MySQL  NODE02:3306 ssl  JS > c.status()
{
    "clusterName": "QACluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "NODE03:3306",
        "ssl": "REQUIRED",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
        "topology": {
            "NODE02:3306": {
                "address": "NODE02:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "NODE03:3306": {
                "address": "NODE03:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "NODE01:3306": {
                "address": "NODE01:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "(MISSING)"
            }
        }
    },
    "groupInformationSourceMember": "mysql://clusterAdmin@NODE03:3306"
}

2)使用以下命令从主节点(健康节点)获取mysqldump。

[root@NODE03 db_backup]# mysqldump --all-databases --add-drop-database --single-transaction --triggers --routines --port=mysql_port --user=root -p > /db_backup/mysql_dump_03062019.sql
Enter password:
Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events.

3) 执行以下步骤从集群中删除故障节点。

 MySQL  NODE03:3306 ssl  JS > var c=dba.getCluster()
 MySQL  NODE03:3306 ssl  JS > c.rescan()
Rescanning the cluster...

Result of the rescanning operation:
{
    "defaultReplicaSet": {
        "name": "default",
        "newlyDiscoveredInstances": [],
        "unavailableInstances": [
            {
                "host": "NODE01:3306",
                "label": "NODE01:3306",
                "member_id": "e2aa897d-1828-11e9-85b3-00505692188c"
            }
        ]
    }
}

The instance 'NODE01:3306' is no longer part of the HA setup. It is either offline or left the HA group.
You can try to add it to the cluster again with the cluster.rejoinInstance('NODE01:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y/n]: Y
Removing instance from the cluster metadata...

The instance 'NODE01:3306' was successfully removed from the cluster metadata.

 MySQL  NODE03:3306 ssl  JS > c.status()
{
    "clusterName": "QACluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "NODE03:3306",
        "ssl": "REQUIRED",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures.",
        "topology": {
            "NODE02:3306": {
                "address": "NODE02:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "NODE03:3306": {
                "address": "NODE03:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    },
    "groupInformationSourceMember": "mysql://clusterAdmin@NODE03:3306"
}

4) 如果组复制仍在故障节点上运行，则停止组复制。

mysql> STOP GROUP_REPLICATION;
Query OK, 0 rows affected (1.01 sec)

5) 在故障节点上重置“gtid_execulated”。

mysql> show global variables like 'GTID_EXECUTED';
+---------------+--------------------------------------------------------------------------------------------+
| Variable_name | Value                                                                                      |
+---------------+--------------------------------------------------------------------------------------------+
| gtid_executed | 01f27b9c-182a-11e9-a199-00505692188c:1-14134172,
e2aa897d-1828-11e9-85b3-00505692188c:1-12 |
+---------------+--------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> reset master;
Query OK, 0 rows affected (0.02 sec)

mysql> reset slave;
Query OK, 0 rows affected (0.02 sec)

mysql> show global variables like 'GTID_EXECUTED';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| gtid_executed |       |
+---------------+-------+
1 row in set (0.00 sec)

6) 在故障节点上禁用“super_readonly_flag”。

mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
|                  1 |                        1 |
+--------------------+--------------------------+
1 row in set (0.00 sec)

mysql> SET GLOBAL super_read_only = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
|                  1 |                        0 |
+--------------------+--------------------------+
1 row in set (0.00 sec)

7) 将 mysqldump 从 master 恢复到故障节点。

[root@E2LXQA1ALFDB01 db_backup]# mysql -uroot -p < mysql_dump_03062019.sql

8) 恢复完成后，在故障节点上启用“super_readonly_flag”。

mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
|                  1 |                        0 |
+--------------------+--------------------------+
1 row in set (0.00 sec)

mysql> SET GLOBAL super_read_only = 1;
Query OK, 0 rows affected (0.00 sec)


mysql> SELECT @@global.read_only, @@global.super_read_only;
+--------------------+--------------------------+
| @@global.read_only | @@global.super_read_only |
+--------------------+--------------------------+
|                  1 |                        1 |
+--------------------+--------------------------+
1 row in set (0.00 sec)

9) 最后将故障节点添加回innodb集群。

MySQL  NODE03:3306 ssl  JS > c.addInstance('clusterAdmin@NODE01:3306');
A new instance will be added to the InnoDB cluster. Depending on the amount of
data on the cluster this might take from a few seconds to several hours.

Adding instance to the cluster ...

Please provide the password for 'clusterAdmin@NODE01:3306': *******************
Save password for 'clusterAdmin@NODE01:3306'? [Y]es/[N]o/Ne[v]er (default No):
Validating instance at NODE01:3306...

This instance reports its own address as NODE01
WARNING: The following tables do not have a Primary Key or equivalent column:
ephesoft.dlf, report.correction_type, report.field_details_ag, report_archive.correction_type, report_archive.field_details_ag, report_archive.global_data_ag

Group Replication requires tables to use InnoDB and have a PRIMARY KEY or PRIMARY KEY Equivalent (non-null unique key). Tables that do not follow these requirements will be readable but not updateable when used with Group Replication. If your applications make updates (INSERT, UPDATE or DELETE) to these tables, ensure they use the InnoDB storage engine and have a PRIMARY KEY or PRIMARY KEY Equivalent.

Instance configuration is suitable.
WARNING: On instance 'NODE01:3306' membership change cannot be persisted since MySQL version 5.7.24 does not support the SET PERSIST command (MySQL version >= 8.0.11 required). Please use the .configureLocalInstance command locally to persist the changes.
WARNING: On instance 'NODE02:3306' membership change cannot be persisted since MySQL version 5.7.24 does not support the SET PERSIST command (MySQL version >= 8.0.11 required). Please use the .configureLocalInstance command locally to persist the changes.
WARNING: On instance 'NODE03:3306' membership change cannot be persisted since MySQL version 5.7.24 does not support the SET PERSIST command (MySQL version >= 8.0.11 required). Please use the .configureLocalInstance command locally to persist the changes.
The instance 'clusterAdmin@NODE01:3306' was successfully added to the cluster.


 MySQL  NODE03:3306 ssl  JS > c.status()
{
    "clusterName": "QACluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "NODE03:3306",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "NODE01:3306": {
                "address": "NODE01:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "NODE02:3306": {
                "address": "NODE02:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "NODE03:3306": {
                "address": "NODE03:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    },
    "groupInformationSourceMember": "mysql://clusterAdmin@NODE03:3306"
}

关于mysql - MySQL 5.7 innoDB集群中的一个节点崩溃并且无法将崩溃的节点重新加入集群，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55036255/

文章推荐： PHP MySQL 选择两个随机行但不使用 rand()

文章推荐： php - 触发计算，在插入一张表时影响另一张表

scala - (重新)在scala中定义()
这个问题在这里已经有了答案: How does Scala's apply() method magic work? (3 个回答) 9年前关闭。假设我在 scala 中有一个 MyList 类，其
python 重新？ : example
这个问题在这里已经有了答案: What is a non-capturing group in regular expressions? (18 个回答) Reference - What does
c++ - (重新)在cpp中没有复制构造函数的情况下初始化对象
这个问题是针对嵌入式系统的! 我有以下选项来初始化一个对象: Object* o = new Object(arg); 这会将对象放入堆中并返回指向它的指针。我不喜欢在嵌入式软件中使用动态分配。 Ob
Python - 重新 - 需要正则表达式的帮助
我自己搜索过，没能成功的正则表达式。我有一个 html 文件，其中包含 [] 之间的变量我想把每一个字都写进去。 [client_name][client_company] [cl
python 重新: no such group
我是 Python 新手。我不明白为什么这段代码不起作用: reOptions = re.search( "[\s+@twitter\s+(?P\w+):(?P.*?)\s+]", d
c - (重新)熟悉一门语言的有效方法是什么？
在过去 7 个月左右的时间里，我几乎一直在使用 .NET C# 进行编程。在那之前，我的大部分编程都是用 C++(从学校里学的)。在工作中，我可能需要在接下来的几个月里做一大堆 C 语言。我对 C 的
python 重新。排除一些结果
我是 RE 的新手，我正在尝试获取歌词并分离出歌词标题、和声和主唱: 下面是一些歌词的例子: [Intro] D.A. got that dope! [Chorus: Travis Scott] Ic
c# - (重新)使用约束类型的泛型参数而不声明它们
这可能是不可能的，但我想检查是否可以用一种简单的方式表达这样的事情: // obviously doesn't work class Foo : IFoo where T: Bar {
MySQL表(重新)设计
我们的应用程序中有“user”和“study”实体，存储在它们各自的表中。一项研究代表一种研究和已收集的数据。它们是多对多的关系，所以我们需要一个链接表:studies_users。我们为用户分配角
database - (重新)命名数据库单元测试中的测试条件
将测试条件添加到 Visual Studio 2010 数据库单元测试(对于 SQL Server 2008)时，这些条件称为例如rowCountCondition1、rowCountConditio
android - (重新)在android模拟器上安装SD卡
在模拟器上，我可以从设置中卸载 SD 卡。然后我可以将它安装到我的操作系统上，然后正常卸载它。我一直无法弄清楚如何在模拟器上重新安装它(无需重新启动)。提示: adb 命令 remount 是无
svn - 如何从颠覆提交中(重新)生成补丁？
假设在一个分支上执行了一系列提交，但该分支尚未与主干重新同步。是否可以从提交中生成全局补丁？是否可以从一系列提交中生成“分组”补丁？如果是，如何？最佳答案 svn diff -rXXX:YYY UR
c# - WPF中的临时锁定窗口(重新)大小
在某些情况下，我想在我的应用程序中锁定调整大小功能，为此我尝试对属性进行数据绑定(bind)，并且不允许在某些情况下更改它，但没有成功。有没有办法这样做？这是我不成功的尝试: XAML: Vie
matlab - 断开显示器连接时(重新)检测连接的显示器数量
当我的计算机连接多个显示器时，我可以检测它们，并根据从获取的值设置位置来向它们绘制图形 get(0, 'MonitorPositions') 但是，当我在 MATLAB 运行时断开监视器时，此属性不会
database - grails如何(重新)连接到第二个数据库
我们有一个grails应用程序，该应用程序在grails数据库中存储了各种域对象。该应用程序连接到第二个数据库，运行一些原始sql，并在表中显示结果。它基本上是一个报告服务器。我们通过在DataSo
c++ - 可以(重新)分配来自不同容器的迭代器吗？
无法比较来自不同容器的迭代器(参见这里的示例: https://stackoverflow.com/a/4664519/225186 )(或者从技术上讲，它不需要有意义。) 这就提出了另一个问题，来自
java - onActivityResult(重新)调用
我有以下情况: 家长 Activity : ParentActivityClass { private Intent intent; @Override public void onCreate(Bu
javascript - Jquery - (重新)连接动态生成的元素
我经常将元素与附加功能 Hook ，例如: $('.myfav').autocomplete(); $('.myfav').datepicker(); $('.myfav').click(somefu
javascript - 如何根据屏幕尺寸(重新)使用不同的选项初始化工具提示？
因此，我将 tooltipster.js 库用于工具提示，并尝试更改工具提示在不同屏幕尺寸上的默认距离。所以这是默认的 init 的样子: $(inputTooltipTrigger).tool
c++ - (重新)实现 dynamic_cast
我在 ARM7 嵌入式环境中工作。我使用的编译器不支持完整的 C++ 功能。它不支持的一项功能是动态类型转换。有没有办法实现dynamic_cast<>() ？我使用 Google 寻找代码，但到

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

mysql - MySQL 5.7 innoDB集群中的一个节点崩溃并且无法将崩溃的节点重新加入集群