gpt4 book ai didi

api - 来自 k8s 主节点的集群 IP 的连接受损/延迟

转载 作者:行者123 更新时间:2023-12-02 11:39:11 26 4
gpt4 key购买 nike

我在带有 flannel:v0.11.0 的 CentOS 7 上使用 kubernetes 1.17,并且在从控制平面访问我的 CLUSTER-IP 时遇到问题。

我使用 kubeadm 手动安装和设置集群。

这基本上是我的集群:

k8s-master-01 10.0.0.50/24
k8s-worker-01 10.0.0.60/24
k8s-worker-02 10.0.0.61/24

Pod CIDR: 10.244.0.0/16
Service CIDR: 10.96.0.0/12

提示:每个节点有两个网卡(eth0:上行链路,eth1:私有(private))上面列出的IP分别分配给eth1。 kubelet、kube-proxy 和 flannel 配置为通过 eth1 上的专用网络发送/接收它们的流量。

当我尝试通过 kube-apiserver 提供 metric-server api 时,我第一次遇到了这个问题。我按照 here 中的说明进行操作.控制平面似乎无法与服务网络正常通信。

下面是我的 kube-system 命名空间的 pod:

$ kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-6955765f44-jrbs6 0/1 Running 9 24d 10.244.0.30 k8s-master-01 <none> <none>
coredns-6955765f44-mwn2l 1/1 Running 8 24d 10.244.1.37 k8s-worker-01 <none> <none>
etcd-k8s-master-01 1/1 Running 9 24d 10.0.0.50 k8s-master-01 <none> <none>
kube-apiserver-k8s-master-01 1/1 Running 0 2m26s 10.0.0.50 k8s-master-01 <none> <none>
kube-controller-manager-k8s-master-01 1/1 Running 15 24d 10.0.0.50 k8s-master-01 <none> <none>
kube-flannel-ds-amd64-7d6jq 1/1 Running 11 26d 10.0.0.60 k8s-worker-01 <none> <none>
kube-flannel-ds-amd64-c5rj2 1/1 Running 11 26d 10.0.0.50 k8s-master-01 <none> <none>
kube-flannel-ds-amd64-dsg6l 1/1 Running 11 26d 10.0.0.61 k8s-worker-02 <none> <none>
kube-proxy-mrz9v 1/1 Running 10 24d 10.0.0.50 k8s-master-01 <none> <none>
kube-proxy-slt95 1/1 Running 9 24d 10.0.0.61 k8s-worker-02 <none> <none>
kube-proxy-txlrp 1/1 Running 9 24d 10.0.0.60 k8s-worker-01 <none> <none>
kube-scheduler-k8s-master-01 1/1 Running 14 24d 10.0.0.50 k8s-master-01 <none> <none>
metrics-server-67684d476-mrvj2 1/1 Running 2 7d23h 10.244.2.43 k8s-worker-02 <none> <none>

这是我的服务:

kubectl get services --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 26d <none>
default phpdemo ClusterIP 10.96.52.157 <none> 80/TCP 11d app=phpdemo
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 26d k8s-app=kube-dns
kube-system metrics-server ClusterIP 10.96.71.138 <none> 443/TCP 5d3h k8s-app=metrics-server
kubernetes-dashboard dashboard-metrics-scraper ClusterIP 10.99.136.237 <none> 8000/TCP 23d k8s-app=dashboard-metrics-scraper
kubernetes-dashboard kubernetes-dashboard ClusterIP 10.97.209.113 <none> 443/TCP 23d k8s-app=kubernetes-dashboard

由于连接检查失败,Metric API 无法工作:

$ kubectl describe apiservice v1beta1.metrics.k8s.io
...
Status:
Conditions:
Last Transition Time: 2019-12-27T21:25:01Z
Message: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Reason: FailedDiscoveryCheck
Status: False
Type:

kube-apiserver 没有连接:

$ kubectl logs --tail=20 kube-apiserver-k8s-master-01 -n kube-system
...
I0101 22:27:00.712413 1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
W0101 22:27:00.712514 1 handler_proxy.go:97] no RequestInfo found in the context
E0101 22:27:00.712559 1 controller.go:114] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0101 22:27:00.712591 1 controller.go:127] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E0101 22:27:04.712991 1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:09.714801 1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:34.709557 1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
E0101 22:27:39.714173 1 available_controller.go:419] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

我试图弄清楚 kube-apiserver 上发生了什么,最终可以确认问题所在。我在 >60 秒后收到延迟响应(不幸的是 time 未安装)

$ kubectl exec -it kube-apiserver-k8s-master-01 -n kube-system -- /bin/sh
# echo -e "GET /apis/metrics.k8s.io/v1beta1 HTTP/1.1\r\nHost:v1beta1.metrics.k8s.io\r\n" | openssl s_client -connect 10.96.71.138:443 -quiet
Can't use SSL_get_servername
depth=1 CN = localhost-ca@1577481905
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=1 CN = localhost-ca@1577481905
verify return:1
depth=0 CN = localhost@1577481906
verify return:1
HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close

我自己的两个测试 pod(分别来自两个不同的工作节点)执行了相同的命令。因此,可以从工作节点上的 Pod 网络访问服务 IP:

$ kubectl exec -it phpdemo-55858f97c4-fjc6q -- /bin/sh
/usr/local/bin # echo -e "GET /apis/metrics.k8s.io/v1beta1 HTTP/1.1\r\nHost:v1beta1.metrics.k8s.io\r\n" | openssl s_client -connect 10.96.71.138:443 -quiet
Can't use SSL_get_servername
depth=1 CN = localhost-ca@1577481905
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=1 CN = localhost-ca@1577481905
verify return:1
depth=0 CN = localhost@1577481906
verify return:1
HTTP/1.1 403 Forbidden
Content-Type: application/json
X-Content-Type-Options: nosniff
Date: Wed, 01 Jan 2020 22:53:44 GMT
Content-Length: 212

{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"","reason":"Forbidden","details":{},"code":403}

还有来自工作节点的:

[root@k8s-worker-02 ~ ] time curl -k https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"",
"reason": "Forbidden",
"details": {

},
"code": 403
}
real 0m0.146s
user 0m0.048s
sys 0m0.089s

这在我的主节点上不起作用。我在 >60 秒后收到延迟响应

[root@k8s-master-01 ~ ] time curl -k https://10.96.71.138:443/apis/metrics.k8s.io/v1beta1
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/apis/metrics.k8s.io/v1beta1\"",
"reason": "Forbidden",
"details": {

},
"code": 403
}
real 1m3.248s
user 0m0.061s
sys 0m0.079s

从主节点我可以看到很多未回复的 SYN_SENT 数据包。

[root@k8s-master-01 ~ ] conntrack -L -d 10.96.71.138
tcp 6 75 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48550 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=19813 mark=0 use=1
tcp 6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48287 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=23710 mark=0 use=1
tcp 6 40 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48422 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=24286 mark=0 use=1
tcp 6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48286 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=35030 mark=0 use=1
tcp 6 80 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48574 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=40636 mark=0 use=1
tcp 6 50 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48464 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=65512 mark=0 use=1
tcp 6 5 SYN_SENT src=10.0.2.15 dst=10.96.71.138 sport=48290 dport=443 [UNREPLIED] src=10.244.2.38 dst=10.244.0.0 sport=4443 dport=47617 mark=0 use=1

iptables 已设置:

[root@k8s-master-01 ~ ] iptables-save | grep 10.96.71.138
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.71.138/32 -p tcp -m comment --comment "kube-system/metrics-server: cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.71.138/32 -p tcp -m comment --comment "kube-system/metrics-server: cluster IP" -m tcp --dport 443 -j KUBE-SVC-LC5QY66VUV2HJ6WZ

kube-proxy 在每个节点上正常运行。

$ kubectl get pods -A -o wide
...
kube-system kube-proxy-mrz9v 1/1 Running 10 21d 10.0.0.50 k8s-master-01 <none> <none>
kube-system kube-proxy-slt95 1/1 Running 9 21d 10.0.0.61 k8s-worker-02 <none> <none>
kube-system kube-proxy-txlrp 1/1 Running 9 21d 10.0.0.60 k8s-worker-01 <none> <none>
$ kubectl -n kube-system logs kube-proxy-mrz9v
W0101 21:31:14.268698 1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:14.283958 1 node.go:135] Successfully retrieved node IP: 10.0.0.50
I0101 21:31:14.284034 1 server_others.go:145] Using iptables Proxier.
I0101 21:31:14.284624 1 server.go:571] Version: v1.17.0
I0101 21:31:14.286031 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:14.286093 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:14.287207 1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:14.298760 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:14.298984 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:14.300618 1 config.go:313] Starting service config controller
I0101 21:31:14.300665 1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:14.300720 1 config.go:131] Starting endpoints config controller
I0101 21:31:14.300740 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:14.400864 1 shared_informer.go:204] Caches are synced for service config
I0101 21:31:14.401021 1 shared_informer.go:204] Caches are synced for endpoints config

> kubectl -n kube-system logs kube-proxy-slt95
W0101 21:31:13.856897 1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:13.905653 1 node.go:135] Successfully retrieved node IP: 10.0.0.61
I0101 21:31:13.905704 1 server_others.go:145] Using iptables Proxier.
I0101 21:31:13.906370 1 server.go:571] Version: v1.17.0
I0101 21:31:13.906983 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:13.907032 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:13.907413 1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:13.912221 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:13.912321 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:13.915322 1 config.go:313] Starting service config controller
I0101 21:31:13.915353 1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:13.915755 1 config.go:131] Starting endpoints config controller
I0101 21:31:13.915779 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:14.016995 1 shared_informer.go:204] Caches are synced for endpoints config
I0101 21:31:14.017115 1 shared_informer.go:204] Caches are synced for service config

> kubectl -n kube-system logs kube-proxy-txlrp
W0101 21:31:13.552518 1 server_others.go:323] Unknown proxy mode "", assuming iptables proxy
I0101 21:31:13.696793 1 node.go:135] Successfully retrieved node IP: 10.0.0.60
I0101 21:31:13.696846 1 server_others.go:145] Using iptables Proxier.
I0101 21:31:13.697396 1 server.go:571] Version: v1.17.0
I0101 21:31:13.698000 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0101 21:31:13.698101 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0101 21:31:13.698509 1 conntrack.go:83] Setting conntrack hashsize to 32768
I0101 21:31:13.704280 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0101 21:31:13.704467 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0101 21:31:13.704888 1 config.go:131] Starting endpoints config controller
I0101 21:31:13.704935 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0101 21:31:13.705046 1 config.go:313] Starting service config controller
I0101 21:31:13.705059 1 shared_informer.go:197] Waiting for caches to sync for service config
I0101 21:31:13.806299 1 shared_informer.go:204] Caches are synced for endpoints config
I0101 21:31:13.806430 1 shared_informer.go:204] Caches are synced for service config

这是我的(默认)kube-proxy 设置:

$ kubectl -n kube-system get configmap kube-proxy -o yaml
apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 5
clusterCIDR: 10.244.0.0/16
configSyncPeriod: 15m0s
conntrack:
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 30s
kind: KubeProxyConfiguration
metricsBindAddress: 127.0.0.1:10249
mode: ""
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""
udpIdleTimeout: 250ms
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://10.0.0.50:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2019-12-06T22:07:40Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "185"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: bac4a8df-e318-4c91-a6ed-9305e58ac6d9
$ kubectl -n kube-system get daemonset kube-proxy -o yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "2"
creationTimestamp: "2019-12-06T22:07:40Z"
generation: 2
labels:
k8s-app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "115436"
selfLink: /apis/apps/v1/namespaces/kube-system/daemonsets/kube-proxy
uid: 64a53d29-1eaa-424f-9ebd-606bcdb3169c
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: kube-proxy
template:
metadata:
creationTimestamp: null
labels:
k8s-app: kube-proxy
spec:
containers:
- command:
- /usr/local/bin/kube-proxy
- --config=/var/lib/kube-proxy/config.conf
- --hostname-override=$(NODE_NAME)
env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: k8s.gcr.io/kube-proxy:v1.17.0
imagePullPolicy: IfNotPresent
name: kube-proxy
resources: {}
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/kube-proxy
name: kube-proxy
- mountPath: /run/xtables.lock
name: xtables-lock
- mountPath: /lib/modules
name: lib-modules
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: true
nodeSelector:
beta.kubernetes.io/os: linux
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kube-proxy
serviceAccountName: kube-proxy
terminationGracePeriodSeconds: 30
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- operator: Exists
volumes:
- configMap:
defaultMode: 420
name: kube-proxy
name: kube-proxy
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
- hostPath:
path: /lib/modules
type: ""
name: lib-modules
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 3
desiredNumberScheduled: 3
numberAvailable: 3
numberMisscheduled: 0
numberReady: 3
observedGeneration: 2
updatedNumberScheduled: 3

这仅仅是配置错误的结果还是错误?感谢您的帮助。

最佳答案

以下是我为让它工作所做的工作:

1.在 kube API 服务器中设置 - --enable-aggregator-routing=true 标志。

2.在 metrics-server-deployment.yaml 中设置以下标志

- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP

3.在 metrics-server-deployment.yaml 中设置 hostNetwork: true

关于api - 来自 k8s 主节点的集群 IP 的连接受损/延迟,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59581669/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com