gpt4 book ai didi

kubernetes Pod 的 readinessProbe 出错但端点未从服务中删除

转载 作者:行者123 更新时间:2023-12-02 11:47:32 27 4
gpt4 key购买 nike

我在 Kubernetes 1.10.111 上运行 Spinnaker。 Spinnaker 服务之一是运行名为 Clouddriver 的服务的 Pod。这个 Pod 运行良好,但随后 readinessProbe 开始不断出错。 Kubernetes 文档说

readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod.

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes

但此 Pod 的 IP 仍在服务的端点中。为什么?

Clouddriver Pod YAML

kubectl -n spinnaker-test get pods spin-clouddriver-5559d44484-mp8q9 -o yaml

apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: spotify.backend-service
creationTimestamp: 2019-02-15T20:46:38Z
generateName: spin-clouddriver-5559d44484-
labels:
app: spin
app.kubernetes.io/managed-by: halyard
app.kubernetes.io/name: clouddriver
app.kubernetes.io/part-of: spinnaker
app.kubernetes.io/version: 1.12.1
cluster: spin-clouddriver
pod-template-hash: "1115800040"
name: spin-clouddriver-5559d44484-mp8q9
namespace: spinnaker-test
ownerReferences:
- apiVersion: extensions/v1beta1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: spin-clouddriver-5559d44484
uid: ce79561c-3161-11e9-acdf-42010a800082
resourceVersion: "53541277"
selfLink: /api/v1/namespaces/spinnaker-test/pods/spin-clouddriver-5559d44484-mp8q9
uid: caa66d7c-3162-11e9-acdf-42010a800082
spec:
containers:
- env:
- name: JAVA_OPTS
value: -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2
- name: SPRING_PROFILES_ACTIVE
value: local
image: gcr.io/spinnaker-marketplace/clouddriver:4.3.1-20190130095322
imagePullPolicy: IfNotPresent
lifecycle: {}
name: clouddriver
ports:
- containerPort: 7002
protocol: TCP
readinessProbe:
exec:
command:
- wget
- --no-check-certificate
- --spider
- -q
- http://localhost:7002/health
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "20"
memory: 5000Mi
requests:
cpu: "20"
memory: 5000Mi
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/spinnaker/config
name: spin-clouddriver-files-1952526246
- mountPath: /home/halyard/.hal/k8s-spinnaker/staging/dependencies
name: spin-clouddriver-files-1757773194
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-w2lt5
readOnly: true
dnsPolicy: ClusterFirst
nodeName: gke-production-us-ce-terraform-201812-d63606d6-9vq9
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 720
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: spin-clouddriver-files-1952526246
secret:
defaultMode: 420
secretName: spin-clouddriver-files-1952526246
- name: spin-clouddriver-files-1757773194
secret:
defaultMode: 420
secretName: spin-clouddriver-files-1757773194
- name: default-token-w2lt5
secret:
defaultMode: 420
secretName: default-token-w2lt5
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2019-02-15T20:46:38Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2019-02-15T20:53:40Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2019-02-15T20:46:38Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://3509b48511b1ea7bc97812cb82831c559d9410cb9eaaa26b4f492d881603fb31
image: gcr.io/spinnaker-marketplace/clouddriver:4.3.1-20190130095322
imageID: docker-pullable://gcr.io/spinnaker-marketplace/clouddriver@sha256:466228b97b8c4a61a0270c53ae4c397eb04bc3661bc4f1ee9ef4d5fce70d187d
lastState: {}
name: clouddriver
ready: true
restartCount: 0
state:
running:
startedAt: 2019-02-15T20:47:26Z
hostIP: 10.178.32.98
phase: Running
podIP: 10.179.34.24
qosClass: Guaranteed
startTime: 2019-02-15T20:46:38Z

对 Pod 的描述表明 readinessProbe 已经连续出错超过一天了。

kubectl -n spinnaker-test describe pods spin-clouddriver-5559d44484-mp8q9

Name: spin-clouddriver-5559d44484-mp8q9
Namespace: spinnaker-test
Node: gke-production-us-ce-terraform-201812-d63606d6-9vq9/10.178.32.98
Start Time: Fri, 15 Feb 2019 15:46:38 -0500
Labels: app=spin
app.kubernetes.io/managed-by=halyard
app.kubernetes.io/name=clouddriver
app.kubernetes.io/part-of=spinnaker
app.kubernetes.io/version=1.12.1
cluster=spin-clouddriver
pod-template-hash=1115800040
Annotations: kubernetes.io/psp=spotify.backend-service
Status: Running
IP: 10.179.34.24
Controlled By: ReplicaSet/spin-clouddriver-5559d44484
Containers:
clouddriver:
Container ID: docker://3509b48511b1ea7bc97812cb82831c559d9410cb9eaaa26b4f492d881603fb31
Image: gcr.io/spinnaker-marketplace/clouddriver:4.3.1-20190130095322
Image ID: docker-pullable://gcr.io/spinnaker-marketplace/clouddriver@sha256:466228b97b8c4a61a0270c53ae4c397eb04bc3661bc4f1ee9ef4d5fce70d187d
Port: 7002/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 15 Feb 2019 15:47:26 -0500
Ready: True
Restart Count: 0
Limits:
cpu: 20
memory: 5000Mi
Requests:
cpu: 20
memory: 5000Mi
Readiness: exec [wget --no-check-certificate --spider -q http://localhost:7002/health] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
JAVA_OPTS: -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2
SPRING_PROFILES_ACTIVE: local
Mounts:
/home/halyard/.hal/k8s-spinnaker/staging/dependencies from spin-clouddriver-files-1757773194 (rw)
/opt/spinnaker/config from spin-clouddriver-files-1952526246 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-w2lt5 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
spin-clouddriver-files-1952526246:
Type: Secret (a volume populated by a Secret)
SecretName: spin-clouddriver-files-1952526246
Optional: false
spin-clouddriver-files-1757773194:
Type: Secret (a volume populated by a Secret)
SecretName: spin-clouddriver-files-1757773194
Optional: false
default-token-w2lt5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-w2lt5
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 3m (x321 over 1d) kubelet, gke-production-us-ce-terraform-201812-d63606d6-9vq9 Readiness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded

但服务在其端点中仍然具有 Pod 的 IP 10.179.34.24。

kubectl -n spinnaker-test describe services spin-clouddriver

Name: spin-clouddriver
Namespace: spinnaker-test
Labels: app=spin
cluster=spin-clouddriver
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"spin","cluster":"spin-clouddriver"},"name":"spin-clouddriver","namesp...
Selector: app=spin,cluster=spin-clouddriver
Type: ClusterIP
IP: 10.178.65.100
Port: <unset> 7002/TCP
TargetPort: 7002/TCP
Endpoints: 10.179.34.24:7002
Session Affinity: None
Events: <none>

kubectl -n spinnaker-test describe endpoints spin-clouddriver

Name: spin-clouddriver
Namespace: spinnaker-test
Labels: app=spin
cluster=spin-clouddriver
Annotations: <none>
Subsets:
Addresses: 10.179.34.24
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
<unset> 7002 TCP

Events: <none>

脚注

  1. 确切地说是 GKE 1.10.11-gke.1,但事实上它是 GKE 并不重要。

最佳答案

kubelet 的探测可以以三种状态之一结束:

  • 成功
  • 失败(命令返回非 0 退出代码)
  • 错误(命令在超时之前没有返回,命令在容器内不存在等)

Here is the code (in 1.10.11)记录事件 probe errored 的位置。请注意 err != nil

Here is the code调用上述函数 - 当 err != nil(探测器返回错误)时,结果将被丢弃。

只有失败的探测才会真正导致 pod 的就绪状态发生变化。

关于kubernetes Pod 的 readinessProbe 出错但端点未从服务中删除,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54842616/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com