gpt4 book ai didi

dns - 经过一段时间后,kube-dns停止工作

转载 作者:行者123 更新时间:2023-12-02 11:44:28 24 4
gpt4 key购买 nike

每当我初始化一个新集群时,一切都可以在3天到一个月左右的时间内完美运行。然后,kube-dns完全停止运行。我可以将其放入kubedns容器中,尽管我真的不知道要寻找什么,但它似乎运行良好。我可以ping主机名,它可以解析并且可以访问,因此kubedns容器本身仍然具有dns服务。它只是不为集群中的其他容器提供它。自从启动之前一直在运行的两个容器中都发生了故障(因此它们曾经能够解析+ ping主机名,但是现在无法解析它,但是仍然可以ping IP),并且创建了新的容器。

我不确定是否与时间有关,或者与创建的作业或 Pane 的数量有关。最近的事件发生在创建了32个 pods 和20个工作岗位之后。

如果我使用以下命令删除kube-dns连载:

kubectl delete pod --namespace kube-system kube-dns-<pod_id>

创建了一个新的kube-dns pod,一切恢复正常(DNS适用于所有新旧容器)。

我有一个主节点和两个工作节点。它们都是CentOS 7机器。

要设置群集,请在主服务器上运行:
systemctl start etcd
etcdctl mkdir /kube-centos/network
etcdctl mk /kube-centos/network/config "{ "Network": "172.30.0.0/16", "SubnetLen": 24, "Backend": { "Type": "vxlan" } }"
systemctl disable etcd && systemctl stop etcd
systemctl enable docker && systemctl start docker
systemctl enable kubelet && systemctl start kubelet
kubeadm init --kubernetes-version v1.10.0 --apiserver-advertise-address=$(hostname) --ignore-preflight-errors=DirAvailable--var-lib-etcd
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=1.10&env.WEAVE_NO_FASTDP=1&env.CHECKPOINT_DISABLE=1"
kubectl --kubeconfig=/etc/kubernetes/admin.conf proxy -p 80 --accept-hosts='.*' --address=<master_ip> &

在这两个 worker 上,我跑:
kubeadm join --token <K8S_MASTER_HOST>:6443 --discovery-token-ca-cert-hash sha256: --ignore-preflight-errors cri

这是我运行过的一些shell命令+输出,可能会有用:

在故障开始发生之前,这是一个正在其中一个工作程序上运行的容器:
bash-4.4# env
PACKAGES= dumb-init musl libc6-compat linux-headers build-base bash git ca-certificates python3 python3-dev
HOSTNAME=network-test
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
KUBERNETES_PORT=tcp://10.96.0.1:443
PWD=/
HOME=/root
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT_443_TCP_PORT=443
ALPINE_VERSION=3.7
KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
TERM=xterm
SHLVL=1
KUBERNETES_SERVICE_PORT=443
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
KUBERNETES_SERVICE_HOST=10.96.0.1
_=/usr/bin/env

bash-4.4# ifconfig
eth0 Link encap:Ethernet HWaddr 8A:DD:6E:E8:C4:E3
inet addr:10.44.0.1 Bcast:10.47.255.255 Mask:255.240.0.0
UP BROADCAST RUNNING MULTICAST MTU:65535 Metric:1
RX packets:1 errors:0 dropped:0 overruns:0 frame:0
TX packets:1 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:42 (42.0 B) TX bytes:42 (42.0 B)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

bash-4.4# ip route
default via 10.44.0.0 dev eth0
10.32.0.0/12 dev eth0 scope link src 10.44.0.1

bash-4.4# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local dc1.int.company.com dc2.int.company.com dc3.int.company.com
options ndots:5

-- Note that even when working, the DNS IP can't be pinged. --
bash-4.4# ping 10.96.0.10
PING 10.96.0.10 (10.96.0.10): 56 data bytes
-- Never unblocks. So we know it's fine that the container can't ping the DNS IP. --

bash-4.4# ping 10.44.0.0
PING 10.44.0.0 (10.44.0.0): 56 data bytes
64 bytes from 10.44.0.0: seq=0 ttl=64 time=0.139 ms
64 bytes from 10.44.0.0: seq=1 ttl=64 time=0.124 ms
--- 10.44.0.0 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.124/0.131/0.139 ms

bash-4.4# ping somehost.env.dc1.int.company.com
PING somehost.env.dc1.int.company.com (10.112.17.2): 56 data bytes
64 bytes from 10.112.17.2: seq=0 ttl=63 time=0.467 ms
64 bytes from 10.112.17.2: seq=1 ttl=63 time=0.271 ms
64 bytes from 10.112.17.2: seq=2 ttl=63 time=0.214 ms
64 bytes from 10.112.17.2: seq=3 ttl=63 time=0.241 ms
64 bytes from 10.112.17.2: seq=4 ttl=63 time=0.350 ms
--- somehost.env.dc1.int.company.com ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 0.214/0.308/0.467 ms

bash-4.4# ping 10.112.17.2
PING 10.112.17.2 (10.112.17.2): 56 data bytes
64 bytes from 10.112.17.2: seq=0 ttl=63 time=0.474 ms
64 bytes from 10.112.17.2: seq=1 ttl=63 time=0.404 ms
--- 10.112.17.2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.404/0.439/0.474 ms

bash-4.4# ping worker1.env
PING worker1.env (10.112.5.50): 56 data bytes
64 bytes from 10.112.5.50: seq=0 ttl=64 time=0.051 ms
64 bytes from 10.112.5.50: seq=1 ttl=64 time=0.102 ms
--- worker1.env ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.051/0.076/0.102 ms

故障开始后,一直运行相同的容器:
bash-4.4# env
PACKAGES= dumb-init musl libc6-compat linux-headers build-base bash git ca-certificates python3 python3-dev
HOSTNAME=vda-test
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
KUBERNETES_PORT=tcp://10.96.0.1:443
PWD=/
HOME=/root
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT_443_TCP_PORT=443
ALPINE_VERSION=3.7
KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
TERM=xterm
SHLVL=1
KUBERNETES_SERVICE_PORT=443
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
KUBERNETES_SERVICE_HOST=10.96.0.1
OLDPWD=/root
_=/usr/bin/env

bash-4.4# ifconfig
eth0 Link encap:Ethernet HWaddr 22:5E:D5:72:97:98
inet addr:10.44.0.2 Bcast:10.47.255.255 Mask:255.240.0.0
UP BROADCAST RUNNING MULTICAST MTU:65535 Metric:1
RX packets:1645 errors:0 dropped:0 overruns:0 frame:0
TX packets:1574 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:718909 (702.0 KiB) TX bytes:150313 (146.7 KiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

bash-4.4# ip route
default via 10.44.0.0 dev eth0
10.32.0.0/12 dev eth0 scope link src 10.44.0.2

bash-4.4# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local dc1.int.company.com dc2.int.company.com dc3.int.company.com

options ndots:5

bash-4.4# ping 10.44.0.0
PING 10.44.0.0 (10.44.0.0): 56 data bytes
64 bytes from 10.44.0.0: seq=0 ttl=64 time=0.130 ms
64 bytes from 10.44.0.0: seq=1 ttl=64 time=0.097 ms
64 bytes from 10.44.0.0: seq=2 ttl=64 time=0.072 ms
64 bytes from 10.44.0.0: seq=3 ttl=64 time=0.102 ms
64 bytes from 10.44.0.0: seq=4 ttl=64 time=0.116 ms
64 bytes from 10.44.0.0: seq=5 ttl=64 time=0.099 ms
64 bytes from 10.44.0.0: seq=6 ttl=64 time=0.167 ms
64 bytes from 10.44.0.0: seq=7 ttl=64 time=0.086 ms
--- 10.44.0.0 ping statistics ---
8 packets transmitted, 8 packets received, 0% packet loss
round-trip min/avg/max = 0.072/0.108/0.167 ms

bash-4.4# ping somehost.env.dc1.int.company.com
ping: bad address 'somehost.env.dc1.int.company.com'

bash-4.4# ping 10.112.17.2
PING 10.112.17.2 (10.112.17.2): 56 data bytes
64 bytes from 10.112.17.2: seq=0 ttl=63 time=0.523 ms
64 bytes from 10.112.17.2: seq=1 ttl=63 time=0.319 ms
64 bytes from 10.112.17.2: seq=2 ttl=63 time=0.304 ms
--- 10.112.17.2 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.304/0.382/0.523 ms

bash-4.4# ping worker1.env
ping: bad address 'worker1.env'

bash-4.4# ping 10.112.5.50
PING 10.112.5.50 (10.112.5.50): 56 data bytes
64 bytes from 10.112.5.50: seq=0 ttl=64 time=0.095 ms
64 bytes from 10.112.5.50: seq=1 ttl=64 time=0.073 ms
64 bytes from 10.112.5.50: seq=2 ttl=64 time=0.083 ms
--- 10.112.5.50 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.073/0.083/0.095 ms

这是kube-dns容器中的一些命令:
/ # ifconfig
eth0 Link encap:Ethernet HWaddr 9A:24:59:D1:09:52
inet addr:10.32.0.2 Bcast:10.47.255.255 Mask:255.240.0.0
UP BROADCAST RUNNING MULTICAST MTU:65535 Metric:1
RX packets:4387680 errors:0 dropped:0 overruns:0 frame:0
TX packets:4124267 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1047398761 (998.8 MiB) TX bytes:1038950587 (990.8 MiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:4352618 errors:0 dropped:0 overruns:0 frame:0
TX packets:4352618 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:359275782 (342.6 MiB) TX bytes:359275782 (342.6 MiB)

/ # ping somehost.env.dc1.int.company.com
PING somehost.env.dc1.int.company.com (10.112.17.2): 56 data bytes
64 bytes from 10.112.17.2: seq=0 ttl=63 time=0.430 ms
64 bytes from 10.112.17.2: seq=1 ttl=63 time=0.252 ms
--- somehost.env.dc1.int.company.com ping statistics ---
2 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 0.208/0.274/0.430 ms

/ # netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53152 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:sunproxyadmin 10.32.0.1:58424 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53174 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:sunproxyadmin 10.32.0.1:58468 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:sunproxyadmin 10.32.0.1:58446 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53096 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:sunproxyadmin 10.32.0.1:58490 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53218 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53100 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53158 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53180 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:sunproxyadmin 10.32.0.1:58402 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53202 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53178 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:sunproxyadmin 10.32.0.1:58368 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53134 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53200 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53136 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53130 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53222 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53196 TIME_WAIT
tcp 0 0 kube-dns-86f4d74b45-2kxdr:48230 10.96.0.1:https ESTABLISHED
tcp 0 0 kube-dns-86f4d74b45-2kxdr:10054 10.32.0.1:53102 TIME_WAIT
netstat: /proc/net/tcp6: No such file or directory
netstat: /proc/net/udp6: No such file or directory
netstat: /proc/net/raw6: No such file or directory
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags Type State I-Node Path

master + worker节点上的版本/ OS信息:
kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:55:54Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.0", GitCommit:"fc32d2f3698e36b93322a3465f63a14e9f0eaead", GitTreeState:"clean", BuildDate:"2018-03-26T16:44:10Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

uname -a
Linux master1.env.dc1.int.company.com 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

最佳答案

不访问集群很难说,但是当您创建Pod时,kube-proxy在您的节点上创建了多个iptables规则,以便您可以访问它们。我的猜测是一个或多个iptables规则弄乱了您的新和现有Pod。

然后,当您删除并重新创建kube-dns pod时,这些iptables将被删除并重新创建,从而使情况恢复正常。

您可以尝试以下操作:

  • 升级到使用core-dns的K8s 1.11。
  • 尝试安装使用不同podCidr的其他网络覆盖
  • 尝试重新启动覆盖 Pane (例如,印花布 Pane )

  • 所有这些都会导致停机,并可能破坏群集。因此,最好先创建一个新集群并进行测试。

    关于dns - 经过一段时间后,kube-dns停止工作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52597484/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com