- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
tl;dr
当启动一个由 3 个 kubernetes pod 组成的全新 percona 集群时,grastate.dat
seq_no
设置为 -1
并且不会更改.在删除一个 pod 并观察它重新启动时,期望它重新加入集群,它将它的初始位置设置为 00000000-0000-0000-0000-000000000000:-1
并尝试连接到它自己(它是以前的ip),也许是因为它是集群中的第一个 pod?然后它在与自身的错误连接中超时:
2017-03-26T08:38:05.374058Z 0 [Note] WSREP: (b7571ff8, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.52.0.26:4567 timed out, no messages seen in PT3S
集群没有正常启动,我无法成功重启集群中的 pod。
完整
当我从头开始启动集群时。有了空白的数据目录和一个新的 etcd 集群,一切似乎都出现了。但是,我查看了 grastate.dat
,发现每个 pod 的 seq_no
是 -1
:
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
此时我可以执行 mysql -h percona -u wordpress -p
并连接并且 wordpress 也可以工作。
场景:我有 3 个 percona pod
/ # jonathan@ubuntu:~/Projects/k8wp$ kubectl get pods
NAME READY STATUS RESTARTS AGE
etcd-0 1/1 Running 1 12h
etcd-1 1/1 Running 0 12h
etcd-2 1/1 Running 3 12h
etcd-3 1/1 Running 1 12h
percona-0 1/1 Running 0 8m
percona-1 1/1 Running 0 57m
percona-2 1/1 Running 0 57m
当我尝试重启 percona-0 时,它在重启时被踢出集群,percona-0 的 gvwstate.dat
文件显示
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/gvwstate.dat
my_uuid: b7571ff8-11f8-11e7-bd2d-8b50487e1523
#vwbeg
view_id: 3 b7571ff8-11f8-11e7-bd2d-8b50487e1523 3
bootstrap: 0
member: b7571ff8-11f8-11e7-bd2d-8b50487e1523 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0
#vwend
集群中的其他 2 个 pod 显示:
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/gvwstate.dat
my_uuid: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a
#vwbeg
view_id: 3 bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 4
bootstrap: 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0
#vwend
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/gvwstate.dat
my_uuid: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a
#vwbeg
view_id: 3 bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 4
bootstrap: 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0
#vwend
以下是我认为 percona-0 启动时的相关错误:
2017-03-26T08:37:58.370605Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2017-03-26T08:37:58.372537Z 0 [Note] WSREP: gcomm: connecting to group 'wordpress-001', peer '10.52.0.26:'
2017-03-26T08:38:01.373345Z 0 [Note] WSREP: (b7571ff8, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.52.0.26:4567 timed out, no messages seen in PT3S
2017-03-26T08:38:01.373682Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-03-26T08:38:01.373750Z 0 [Note] WSREP: view(view_id(NON_PRIM,b7571ff8,5) memb {
b7571ff8,0
} joined {
} left {
} partitioned {
})
2017-03-26T08:38:01.373838Z 0 [Note] WSREP: gcomm: connected
2017-03-26T08:38:01.373872Z 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2017-03-26T08:38:01.373987Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-03-26T08:38:01.374012Z 0 [Note] WSREP: Opened channel 'wordpress-001'
2017-03-26T08:38:01.374108Z 0 [Note] WSREP: Waiting for SST to complete.
2017-03-26T08:38:01.374417Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2017-03-26T08:38:01.374469Z 0 [Note] WSREP: Flow-control interval: [16, 16]
2017-03-26T08:38:01.374491Z 0 [Note] WSREP: Received NON-PRIMARY.
2017-03-26T08:38:01.374560Z 1 [Note] WSREP: New cluster view: global state: :-1, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version -1
它尝试连接到 2017-03-26T08:37:58.372537Z 0 中的
实际上是 pods 之前的 ip,这是我在删除 percona-0 之前在 etcd 中做的 key 列表10.52.0.26
的 ip [注意] WSREP: gcomm: connecting to group 'wordpress-001' , peer '10.52.0.26:'
/ # etcdctl ls --recursive
/pxc-cluster
/pxc-cluster/wordpress
/pxc-cluster/queue
/pxc-cluster/queue/wordpress
/pxc-cluster/queue/wordpress-001
/pxc-cluster/wordpress-001
/pxc-cluster/wordpress-001/10.52.1.46
/pxc-cluster/wordpress-001/10.52.1.46/ipaddr
/pxc-cluster/wordpress-001/10.52.1.46/hostname
/pxc-cluster/wordpress-001/10.52.2.33
/pxc-cluster/wordpress-001/10.52.2.33/ipaddr
/pxc-cluster/wordpress-001/10.52.2.33/hostname
/pxc-cluster/wordpress-001/10.52.0.26
/pxc-cluster/wordpress-001/10.52.0.26/hostname
/pxc-cluster/wordpress-001/10.52.0.26/ipaddr
kubectl 删除 pods/percona-0 后:
/ # etcdctl ls --recursive
/pxc-cluster
/pxc-cluster/queue
/pxc-cluster/queue/wordpress
/pxc-cluster/queue/wordpress-001
/pxc-cluster/wordpress-001
/pxc-cluster/wordpress-001/10.52.1.46
/pxc-cluster/wordpress-001/10.52.1.46/ipaddr
/pxc-cluster/wordpress-001/10.52.1.46/hostname
/pxc-cluster/wordpress-001/10.52.2.33
/pxc-cluster/wordpress-001/10.52.2.33/ipaddr
/pxc-cluster/wordpress-001/10.52.2.33/hostname
/pxc-cluster/wordpress
同样在重启期间 percona-0 尝试注册到 etcd:
{"action":"create","node":{"key":"/pxc-cluster/queue/wordpress-001/00000000000000009886","value":"10.52.0.27","expiration":"2017-03-26T08:38:57.980325718Z","ttl":60,"modifiedIndex":9886,"createdIndex":9886}}
{"action":"set","node":{"key":"/pxc-cluster/wordpress-001/10.52.0.27/ipaddr","value":"10.52.0.27","expiration":"2017-03-26T08:38:28.01814818Z","ttl":30,"modifiedIndex":9887,"createdIndex":9887}}
{"action":"set","node":{"key":"/pxc-cluster/wordpress-001/10.52.0.27/hostname","value":"percona-0","expiration":"2017-03-26T08:38:28.037188157Z","ttl":30,"modifiedIndex":9888,"createdIndex":9888}}
{"action":"update","node":{"key":"/pxc-cluster/wordpress-001/10.52.0.27","dir":true,"expiration":"2017-03-26T08:38:28.054726795Z","ttl":30,"modifiedIndex":9889,"createdIndex":9887},"prevNode":{"key":"/pxc-cluster/wordpress-001/10.52.0.27","dir":true,"modifiedIndex":9887,"createdIndex":9887}}
这是行不通的。
来自集群 percona-1
的第二个成员:
2017-03-26T08:37:44.069583Z 0 [Note] WSREP: (bd05a643, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.52.0.26:4567
2017-03-26T08:37:45.069756Z 0 [Note] WSREP: (bd05a643, 'tcp://0.0.0.0:4567') reconnecting to b7571ff8 (tcp://10.52.0.26:4567), attempt 0
2017-03-26T08:37:48.570332Z 0 [Note] WSREP: (bd05a643, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.52.0.26:4567 timed out, no messages seen in PT3S
2017-03-26T08:37:49.605089Z 0 [Note] WSREP: evs::proto(bd05a643, GATHER, view_id(REG,b7571ff8,3)) suspecting node: b7571ff8
2017-03-26T08:37:49.605276Z 0 [Note] WSREP: evs::proto(bd05a643, GATHER, view_id(REG,b7571ff8,3)) suspected node without join message, declaring inactive
2017-03-26T08:37:50.104676Z 0 [Note] WSREP: declaring c33d6a73 at tcp://10.52.2.33:4567 stable
新信息:我再次重新启动 percona-0,这次它不知何故出现了!几次尝试后,我意识到 pod 需要重新启动两次才能出现,即在第一次删除它之后,它出现了上述错误,在第二次删除它之后它出现正常并与其他成员同步。会不会是因为它是集群中的第一个 pod?
我测试过删除其他 pod,但它们都恢复正常。
问题只在于 percona-0。
还有;一次把所有的 Pod 都拿下来,如果我的节点崩溃了,那就是 Pod 根本不会恢复的情况!我怀疑这是因为没有状态被保存到 grastate.dat ,即 seq_no 保持 -1,即使全局 id 可能改变,pod 退出并关闭 mysqld,并出现以下错误:
jonathan@ubuntu:~/Projects/k8wp$ kubectl logs percona-2 | grep ERROR
2017-03-26T11:20:25.795085Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
2017-03-26T11:20:25.795276Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-03-26T11:20:25.795544Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'wordpress-001' at 'gcomm://10.52.2.36': -110 (Connection timed out)
2017-03-26T11:20:25.795618Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-03-26T11:20:25.795645Z 0 [ERROR] WSREP: wsrep::connect(gcomm://10.52.2.36) failed: 7
2017-03-26T11:20:25.795693Z 0 [ERROR] Aborting
jonathan@ubuntu:~/Projects/k8wp$ kubectl logs percona-1 | grep ERROR
2017-03-26T11:20:27.093780Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
2017-03-26T11:20:27.093977Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-03-26T11:20:27.094145Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'wordpress-001' at 'gcomm://10.52.1.49': -110 (Connection timed out)
2017-03-26T11:20:27.094200Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-03-26T11:20:27.094227Z 0 [ERROR] WSREP: wsrep::connect(gcomm://10.52.1.49) failed: 7
2017-03-26T11:20:27.094247Z 0 [ERROR] Aborting
jonathan@ubuntu:~/Projects/k8wp$ kubectl logs percona-0 | grep ERROR
2017-03-26T11:20:52.040214Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
2017-03-26T11:20:52.040279Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-03-26T11:20:52.040385Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'wordpress-001' at 'gcomm://10.52.2.36': -110 (Connection timed out)
2017-03-26T11:20:52.040437Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-03-26T11:20:52.040471Z 0 [ERROR] WSREP: wsrep::connect(gcomm://10.52.2.36) failed: 7
2017-03-26T11:20:52.040508Z 0 [ERROR] Aborting
grastate.dat
删除所有 pod:
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
不,gvwstate.dat
最佳答案
通过将容器中的入口点更改为以下脚本来修复它:
#!/bin/bash
sed -i \"s|safe_to_bootstrap.*:.*|safe_to_bootstrap:1|1\" /var/lib/mysql/grastate.dat;
/entrypoint.sh --wsrep-new-cluster;
问题是,当 3 个 pod 从崩溃中重启时,它们都遇到了以下错误:
[ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
这意味着(从链接中总结),因为所有 pod 都关闭了,所以第一个 pod(pod 由 statefulset 管理)出现并尝试重新连接到集群,但没有找到任何其他 pod它可以连接到的 pod,所以它下降,下一个 pod 出现尝试同样的事情,遇到同样的错误然后下降到等等等等
解决方案是让第一个 pod 在出现时启动一个新集群,然后所有后续的 pod 都会出现并找到要连接的节点。它仍然会提供所有数据。
因此,使用 percona xtradb 时,docker 容器的入口点如下所示:
exec mysqld --user=mysql --wsrep_cluster_name=$CLUSTER_NAME --wsrep_cluster_address="gcomm://$cluster_join" --wsrep_sst_method=xtrabackup-v2 --wsrep_sst_auth="xtrabackup:$XTRABACKUP_PASSWORD" --log-error=${DATADIR}error.log $CMDARG
因此,要让设置运行,我要做的就是将前面的参数 --wsrep-new-cluster
传递给/entrypoint.sh 文件,如下所示:
/entrypoint.sh --wsrep-new-cluster
附言//我首先单独尝试了上面的方法,但我遇到了一个错误,指出要强制一个新的集群并使用该节点进行引导,我必须在 /var/lib/中将
safe_to_bootstrap
从 0 设置为 1 mysql/grastate.dat
关于mysql - percona mysql xtradb集群启动不正常,节点重启不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43027043/
我有一个 UWP 应用程序(在 Windows/Microsoft Store 中发布),我正在进行新的更新,我在我的应用程序中使用了 Template10,它具有深色和浅色主题,并且在 Window
我是 spring batch 的新手,有一些关于暂停/恢复的问题。看了spring batch的文档,好像没有内置的pause或者resume功能。但是,我从主站点找到了这个用例: http://d
我正在编写一个网络服务并有以下观察结果:即使我只是将一个文本文件添加到存储 web 服务引用的所有 dll 的目录 (bin),appdomain 也会刷新。 这会导致存储在字典(在其中一个 dll
关闭。这个问题需要更多focused .它目前不接受答案。 想改进这个问题吗? 更新问题,使其只关注一个问题 editing this post . 关闭 6 年前。 Improve this qu
Hadoop 1.0.3 工作 36 小时后说: INFO mapred.JobClient: map 42% reduce 0% mapred.JobClient: Job Failed
我使用 AVAssetWriter 将视频录制到文件中。所以我为此创建了类。 link to gist 然后在项目的某处我推送记录并开始录制视频。 func start() { assetWriter
我想要一个在后台运行的 python 脚本(无限循环)。 def main(): # inizialize and start threads [...] try:
我在重新启动 Activity 时感到困惑。我有两个功能可以很好地完成同一任务。请指导我哪个最好,为什么? public void restart() { Intent
重启sidekiq的正确方法是什么。它似乎在我启动它时缓存了我的 worker 代码,所以每次我对我的 worker 进行更改时我都需要重新启动它。我正在使用 Ctrl/C 执行此操作,但该过程需要很
我在我的 Android 模拟器上安装了新字体。说明说我必须重新启动设备。我尝试使用“关机”按钮,但它只显示“正在关机”并且什么也不做。即使我去 adb shell 并运行“重启”它也会挂起。 任何想
启动操作 ? 1
关闭 service nginx stop systemctl stop nginx 启动 service nginx start systemctl start n
正在学习Linux中。。。一边学一边记录着。。所有观点只是个人观点 Linux有个文件 /etc/inittab 复制代码 代码如下:
如果我运行 systemctl restart kubelet它会影响其他正在运行的节点吗?它会停止集群吗?你能预见任何影响吗? 任何帮助,将不胜感激! 最佳答案 在回答之前,小声明:重启不是由于对
嗯,问题是我有一个在 MATE 上完美运行的 Abyssus Razer,但是 在 Debian、Elementary、OpenSUSE 和其他平台上,默认 设置 super 慢。 我用 解决了这个问
我在 Ubuntu 16.04 上安装了 NGINX 并编辑了我的配置。 当我想用 sudo service nginx restart 重新启动时我得到错误: Job for nginx.servi
我已经在我的 Ubuntu 上安装了 Gearman Job Server(又名 Gearmand)1.0.6: Distributor ID: Ubuntu Description: Ubun
我有一个 WiX Burn使用 ManagedBootstrapperApplicationHost 的自定义安装程序。安装必备 Microsoft Windows Installer 之一后4.5
我已经使用 brew install mosquitto 在我的 mac 上安装了蚊子代理. 通常我不会给出任何命令来启动 mosquitto 服务器。当我打开我的 mac 时它会自动启动。 我已经使
我有一个带有 2 个容器的 pod test-1495806908-xn5jn。我想重新启动其中一个名为 container-test 的项目。是否可以重新启动 Pod 中的单个容器以及如何重新启动?
我是一名优秀的程序员,十分优秀!