gpt4 book ai didi

kubernetes - kubernetes集群中的DNS解析问题

转载 作者:行者123 更新时间:2023-12-02 11:39:13 24 4
gpt4 key购买 nike

我们有一个 kubernetes 集群,由四个工作节点和一个主节点组成。在 worker1worker2 上我们无法解析 DNS 名称,但在其他两个节点中一切正常!我按照官方文档 here 的说明进行操作,我意识到 coredns pod 没有收到来自 worker1 和 2 的查询。
我在 worker3worker4 中重复所有事情都很好,我对 worker1worker2 有问题。例如,当我在 worker1 中运行 busybox 容器并执行 nslookup kubernetes.default 时,它​​不会返回任何内容,但是当它在 worker3 中运行时,DNS 解析是可以的。

集群信息:

$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:43:08Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

$ kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-576cbf47c7-6dtrc 1/1 Running 5 82d
coredns-576cbf47c7-jvx5l 1/1 Running 6 82d
etcd-master 1/1 Running 35 298d
kube-apiserver-master 1/1 Running 14 135m
kube-controller-manager-master 1/1 Running 42 298d
kube-proxy-22f49 1/1 Running 9 91d
kube-proxy-2s9sx 1/1 Running 34 298d
kube-proxy-jh2m7 1/1 Running 5 81d
kube-proxy-rc5r8 1/1 Running 5 63d
kube-proxy-vg8jd 1/1 Running 6 104d
kube-scheduler-master 1/1 Running 39 298d
kubernetes-dashboard-65c76f6c97-7cwwp 1/1 Running 45 293d
tiller-deploy-779784fbd6-dzq7k 1/1 Running 5 87d
weave-net-556ml 2/2 Running 12 66d
weave-net-h9km9 2/2 Running 15 81d
weave-net-s88z4 2/2 Running 0 145m
weave-net-smrgc 2/2 Running 14 63d
weave-net-xf6ng 2/2 Running 15 82d

$ kubectl logs coredns-576cbf47c7-6dtrc -n kube-system | tail -20
10.44.0.28:32837 - [14/Dec/2019:12:22:51 +0000] 2957 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000661167s
10.44.0.28:51373 - [14/Dec/2019:12:25:09 +0000] 46278 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000440918s
10.44.0.28:51373 - [14/Dec/2019:12:25:09 +0000] 47697 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.00059741s
10.44.0.28:44969 - [14/Dec/2019:12:27:27 +0000] 33222 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.00044739s
10.44.0.28:44969 - [14/Dec/2019:12:27:27 +0000] 52126 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.000310494s
10.44.0.28:39392 - [14/Dec/2019:12:29:11 +0000] 41041 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000481309s
10.44.0.28:40999 - [14/Dec/2019:12:29:11 +0000] 695 "AAAA IN spark-master.svc.cluster.local. udp 48 false 512" NXDOMAIN qr,aa,rd,ra 141 0.000247078s
10.44.0.28:54835 - [14/Dec/2019:12:29:12 +0000] 59604 "AAAA IN spark-master. udp 30 false 512" NXDOMAIN qr,rd,ra 106 0.020408006s
10.44.0.28:38604 - [14/Dec/2019:12:29:15 +0000] 53244 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.000209231s
10.44.0.28:38604 - [14/Dec/2019:12:29:15 +0000] 23079 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,rd,ra 149 0.000191722s
10.44.0.28:57478 - [14/Dec/2019:12:32:15 +0000] 15451 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000383919s
10.44.0.28:57478 - [14/Dec/2019:12:32:15 +0000] 45086 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.001197812s
10.40.0.34:54678 - [14/Dec/2019:12:52:31 +0000] 6509 "A IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000522769s
10.40.0.34:60234 - [14/Dec/2019:12:52:31 +0000] 15538 "AAAA IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000851171s
10.40.0.34:43989 - [14/Dec/2019:12:52:31 +0000] 2712 "AAAA IN kubernetes.default.svc.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000306038s
10.40.0.34:59265 - [14/Dec/2019:12:52:31 +0000] 23765 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000274748s
10.40.0.34:45622 - [14/Dec/2019:13:26:31 +0000] 38766 "AAAA IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000436681s
10.40.0.34:42759 - [14/Dec/2019:13:26:31 +0000] 56753 "A IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000706638s
10.40.0.34:39563 - [14/Dec/2019:13:26:31 +0000] 37876 "AAAA IN kubernetes.default.svc.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000445999s
10.40.0.34:57224 - [14/Dec/2019:13:26:31 +0000] 33157 "A IN kubernetes.default.svc.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000536896s

$ kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 298d
kubernetes-dashboard ClusterIP 10.96.204.236 <none> 443/TCP 298d
tiller-deploy ClusterIP 10.110.41.66 <none> 44134/TCP 123d

$ kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.32.0.98:53,10.44.0.21:53,10.32.0.98:53 + 1 more... 298d

当busybox在worker1中时:
$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1

但是当busybox在worker3中时:
$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10
Name: kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local

所有节点都是:Ubuntu 16.04

所有 pod 的/etc/resolve.conf 的内容都是相同的。

我能找到的唯一区别是 kube-proxy 日志:

工作节点 kube-proxy 日志:
$ kubectl logs kube-proxy-vg8jd -n kube-system

W1214 06:12:19.201889 1 server_others.go:295] Flag proxy-mode="" unknown, assuming iptables proxy
I1214 06:12:19.321747 1 server_others.go:148] Using iptables Proxier.
W1214 06:12:19.332725 1 proxier.go:317] clusterCIDR not specified, unable to distinguish between internal and external traffic
I1214 06:12:19.332949 1 server_others.go:178] Tearing down inactive rules.
I1214 06:12:20.557875 1 server.go:447] Version: v1.12.1
I1214 06:12:20.601081 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I1214 06:12:20.601393 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1214 06:12:20.601958 1 conntrack.go:83] Setting conntrack hashsize to 32768
I1214 06:12:20.602234 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I1214 06:12:20.602300 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I1214 06:12:20.602544 1 config.go:202] Starting service config controller
I1214 06:12:20.602561 1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I1214 06:12:20.602585 1 config.go:102] Starting endpoints config controller
I1214 06:12:20.602619 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
I1214 06:12:20.702774 1 controller_utils.go:1034] Caches are synced for service config controller
I1214 06:12:20.702827 1 controller_utils.go:1034] Caches are synced for endpoints config controller

不工作的节点 kube-proxy 日志:
$ kubectl logs kube-proxy-fgzpf -n kube-system

W1215 12:47:12.660749 1 server_others.go:295] Flag proxy-mode="" unknown, assuming iptables proxy
I1215 12:47:12.679348 1 server_others.go:148] Using iptables Proxier.
W1215 12:47:12.679538 1 proxier.go:317] clusterCIDR not specified, unable to distinguish between internal and external traffic
I1215 12:47:12.679665 1 server_others.go:178] Tearing down inactive rules.
E1215 12:47:12.760702 1 proxier.go:529] Error removing iptables rules in ipvs proxier: error deleting chain "KUBE-MARK-MASQ": exit status 1: iptables: Too many links.
I1215 12:47:12.799926 1 server.go:447] Version: v1.12.1
I1215 12:47:12.832047 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I1215 12:47:12.833067 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1215 12:47:12.833266 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I1215 12:47:12.833498 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I1215 12:47:12.833934 1 config.go:202] Starting service config controller
I1215 12:47:12.834061 1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I1215 12:47:12.834253 1 config.go:102] Starting endpoints config controller
I1215 12:47:12.834338 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
I1215 12:47:12.934408 1 controller_utils.go:1034] Caches are synced for service config controller
I1215 12:47:12.934564 1 controller_utils.go:1034] Caches are synced for endpoints config controller

第 5 行没有出现在第一个中。我不知道这是否与问题有关。

欢迎任何建议。

最佳答案

svc.svckubernetes.default.svc.svc.cluster.local看起来很奇怪。检查 coredns-576cbf47c7-6dtrc 中是否相同荚。

关闭 coredns-576cbf47c7-6dtrc pod 以保证单个剩余的 DNS 实例将回答来自所有工作节点的 DNS 查询。

根据docs ,像这样的问题“...表明 coredns/kube-dns 附加组件或相关服务有问题”。重新启动 coredns 可能会解决问题。

我会添加到要检查和比较的事项列表中 /etc/resolv.conf在节点上。

关于kubernetes - kubernetes集群中的DNS解析问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59335802/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com