gpt4 book ai didi

kubernetes - kube-dns getsockopt 没有路由到主机

转载 作者:行者123 更新时间:2023-12-02 11:44:51 26 4
gpt4 key购买 nike

我正在努力了解如何在 kubernetes 1.10 上使用 flannel 正确配置 kube-dns,并将 containerd 作为 CRI。

kube-dns 无法运行,出现以下错误:

kubectl -n kube-system logs kube-dns-595fdb6c46-9tvn9 -c kubedns
I0424 14:56:34.944476 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:35.444469 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
I0424 14:56:35.944444 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.444462 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.944507 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
F0424 14:56:37.444434 1 dns.go:209] Timeout waiting for initialization

kubectl -n kube-system describe pod kube-dns-595fdb6c46-9tvn9
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 47m (x181 over 3h) kubelet, worker1 Readiness probe failed: Get http://10.244.0.2:8081/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning BackOff 27m (x519 over 3h) kubelet, worker1 Back-off restarting failed container
Normal Killing 17m (x44 over 3h) kubelet, worker1 Killing container with id containerd://dnsmasq:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 12m (x178 over 3h) kubelet, worker1 Liveness probe failed: Get http://10.244.0.2:10054/metrics: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning BackOff 2m (x855 over 3h) kubelet, worker1 Back-off restarting failed container

确实没有到 10.96.0.1 端点的路由:
ip route
default via 10.240.0.254 dev ens160
10.240.0.0/24 dev ens160 proto kernel scope link src 10.240.0.21
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.0.0/16 dev cni0 proto kernel scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
10.244.4.0/24 via 10.244.4.0 dev flannel.1 onlink
10.244.5.0/24 via 10.244.5.0 dev flannel.1 onlink

什么负责配置集群服务地址范围和关联路由?是容器运行时、覆盖网络(在本例中为 flannel)还是其他什么?应该在哪里配置?
10-containerd-net.conflist配置主机和我的 pod 网络之间的网桥。服务网络也可以在这里配置吗?
cat /etc/cni/net.d/10-containerd-net.conflist 
{
"cniVersion": "0.3.1",
"name": "containerd-net",
"plugins": [
{
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"promiscMode": true,
"ipam": {
"type": "host-local",
"subnet": "10.244.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
},
{
"type": "portmap",
"capabilities": {"portMappings": true}
}
]
}

编辑:

刚遇到 this从 2016 年开始:

As of a few weeks ago (I forget the release but it was a 1.2.x where x != 0) (#24429) we fixed the routing such that any traffic that arrives at a node destined for a service IP will be handled as if it came to a node port. This means you should be able to set yo static routes for your service cluster IP range to one or more nodes and the nodes will act as bridges. This is the same trick most people do with flannel to bridge the overlay.

It's imperfect but it works. In the future will will need to get more precise with the routing if you want optimal behavior (i.e. not losing the client IP), or we will see more non-kube-proxy implementations of services.



这仍然相关吗?我需要为服务 CIDR 设置静态路由吗?或者问题实际上是 kube-proxy而不是法兰绒或容器?

我的法兰绒配置:
cat /etc/cni/net.d/10-flannel.conflist 
{
"name": "cbr0",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}

和 kube-proxy:
[Unit]
Description=Kubernetes Kube Proxy
Documentation=https://github.com/kubernetes/kubernetes

[Service]
ExecStart=/usr/local/bin/kube-proxy \
--cluster-cidr=10.244.0.0/16 \
--feature-gates=SupportIPVSProxyMode=true \
--ipvs-min-sync-period=5s \
--ipvs-sync-period=5s \
--ipvs-scheduler=rr \
--kubeconfig=/etc/kubernetes/kube-proxy.conf \
--logtostderr=true \
--master=https://192.168.160.1:6443 \
--proxy-mode=ipvs \
--v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

编辑:

看过 kube-proxy debugging steps ,看来 kube-proxy无法联系主人。我怀疑这是问题的很大一部分。我在 HAProxy 负载均衡器后面有 3 个 Controller /主节点,绑定(bind)到 192.168.160.1:6443并将循环转发给 10.240.0.1[1|2|3]:6443 上的每个主人.这可以在上面的输出/配置中看到。

kube-proxy.service , 我已指定 --master=192.168.160.1:6443 .为什么尝试连接到端口 443?我可以改变这个 - 似乎没有端口标志吗?出于某种原因,它是否需要使用端口 443?

最佳答案

这个答案有两个组成部分,一个是关于运行 kube-proxy还有一个关于这些 :443 URL 的来源。

一、关于kube-proxy : 请不要跑kube-proxy作为这样的系统服务。它旨在由 kubelet 发起在集群中,以便 SDN 地址表现合理,因为它们实际上是“假”地址。通过运行 kube-proxy不受kubelet 的控制,除非你花费大量的精力来复制 kubelet 的方式,否则各种奇怪的事情都会发生。配置其从属 docker 容器。

现在,关于 :443 URL:

E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host

...

Why are connections being attempted to port 443? Can I change this - there doesn't seem to be a port flag? Does it need to be port 443 for some reason?



10.96.0.1 来自集群的 Service CIDR,它(并且应该)与 Pod CIDR 分开,Pod CIDR 应该与节点的子网分开,等等。 .1集群的服务 CIDR 的一部分保留(或传统上分配)给 kubernetes.default.svc.cluster.local Service , 及其一个 Service.port443 .

我不太确定为什么 --master标志不会取代 /etc/kubernetes/kube-proxy.conf 中的值但由于该文件很明显只应该由 kube-proxy 使用,为什么不只是更新文件中的值以消除所有疑问?

关于kubernetes - kube-dns getsockopt 没有路由到主机,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50005064/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com