gpt4 book ai didi

google-kubernetes-engine - Liveness Probe、Readiness Probe 未在预期持续时间内调用

转载 作者:行者123 更新时间:2023-12-04 08:04:27 25 4
gpt4 key购买 nike

在 GKE 上,我尝试使用就绪探针/活性探针,并使用监控发布警报 https://cloud.google.com/monitoring/alerts/using-alerting-ui
作为测试,我创建了一个具有就绪探针/活性探针的 pod。正如我所料,每次探测检查都失败了。

apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http
spec:
containers:
- name: liveness
image: k8s.gcr.io/liveness
args:
- /server
readinessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 10
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 20
periodSeconds: 60
timeoutSeconds: 30
successThreshold: 1
failureThreshold: 3
并检查 GCP 日志,首先显示了基于 periodSeconds 的两个错误日志。
就绪探针:每 10 秒一次

2021-02-21 13:26:30.000 JSTReadiness probe failed: HTTP probe failed with statuscode: 500


2021-02-21 13:26:40.000 JSTReadiness probe failed: HTTP probe failed with statuscode: 500


活性探测:每1分钟一次

2021-02-21 13:25:40.000 JSTLiveness probe failed: HTTP probe failed with statuscode: 500


2021-02-21 13:26:40.000 JSTLiveness probe failed: HTTP probe failed with statuscode: 500


但是,在运行这个 pod 几分钟后
  • 活性探针检查不再被调用
  • 调用就绪探针检查但间隔变长(最大间隔看起来大约 10 分钟)
  • $ kubectl get event
    LAST SEEN TYPE REASON OBJECT MESSAGE
    30m Normal Pulling pod/liveness-http Pulling image "k8s.gcr.io/liveness"
    25m Warning Unhealthy pod/liveness-http Readiness probe failed: HTTP probe failed with statuscode: 500
    20m Warning BackOff pod/liveness-http Back-off restarting failed container
    20m Normal Scheduled pod/liveness-http Successfully assigned default/liveness-http to gke-cluster-default-pool-8bc9c75c-rfgc
    17m Normal Pulling pod/liveness-http Pulling image "k8s.gcr.io/liveness"
    17m Normal Pulled pod/liveness-http Successfully pulled image "k8s.gcr.io/liveness"
    17m Normal Created pod/liveness-http Created container liveness
    20m Normal Started pod/liveness-http Started container liveness
    4m59s Warning Unhealthy pod/liveness-http Readiness probe failed: HTTP probe failed with statuscode: 500
    17m Warning Unhealthy pod/liveness-http Liveness probe failed: HTTP probe failed with statuscode: 500
    17m Normal Killing pod/liveness-http Container liveness failed liveness probe, will be restarted

    在我的计划中,我将创建警报策略,其条件类似于
  • 如果活性探针错误在 3 分钟内发生 3 次

  • 但是如果探测检查没有像我预期的那样调用,这些策略就不起作用;即使 pod 没有运行,警报也已修复

    为什么 Liveness 探针没有运行,Readiness 探针的间隔发生了变化?
    注意:如果有其他好的警报策略来检查 pod 的活跃度,我不会关心这种行为。如果有人建议我哪种警报策略最适合检查 pod,我将不胜感激。

    最佳答案

    背景
    Configure Liveness, Readiness and Startup Probes 文档中,您可以找到信息:

    The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.

    The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.


    由于 GKE master 由 google 管理,因此您不会使用 kubelet 找到 CLI 日志(您可能会尝试使用 Stackdriver )。我已经在 Kubeadm 集群上对其进行了测试,并将 verbosity 级别设置为 8
    当您使用 $ kubectl get events 时,您只会收到最后一小时的事件(它可以在 Kubernetes 设置中更改 - Kubeadm 但我认为它不能在 GKE 中更改,因为 master 由 google 管理。)
    $ kubectl get events
    LAST SEEN TYPE REASON OBJECT MESSAGE
    37m Normal Starting node/kubeadm Starting kubelet.
    ...
    33m Normal Scheduled pod/liveness-http Successfully assigned default/liveness-http to kubeadm
    33m Normal Pulling pod/liveness-http Pulling image "k8s.gcr.io/liveness"
    33m Normal Pulled pod/liveness-http Successfully pulled image "k8s.gcr.io/liveness" in 893.953679ms
    33m Normal Created pod/liveness-http Created container liveness
    33m Normal Started pod/liveness-http Started container liveness
    3m12s Warning Unhealthy pod/liveness-http Readiness probe failed: HTTP probe failed with statuscode: 500
    30m Warning Unhealthy pod/liveness-http Liveness probe failed: HTTP probe failed with statuscode: 500
    8m17s Warning BackOff pod/liveness-http Back-off restarting failed container
    ~1 hour 之后再次执行相同的命令。
    $ kubectl get events
    LAST SEEN TYPE REASON OBJECT MESSAGE
    33s Normal Pulling pod/liveness-http Pulling image "k8s.gcr.io/liveness"
    5m40s Warning Unhealthy pod/liveness-http Readiness probe failed: HTTP probe failed with statuscode: 500
    15m Warning BackOff pod/liveness-http Back-off restarting failed container
    测试 Readiness Probe 检查每 10 秒执行一次,持续时间超过一小时。
    Mar 09 14:48:34 kubeadm kubelet[3855]: I0309 14:48:34.222085    3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
    Mar 09 14:48:44 kubeadm kubelet[3855]: I0309 14:48:44.221782 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
    Mar 09 14:48:54 kubeadm kubelet[3855]: I0309 14:48:54.221828 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
    ...
    Mar 09 15:01:34 kubeadm kubelet[3855]: I0309 15:01:34.222491 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4
    562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
    Mar 09 15:01:44 kubeadm kubelet[3855]: I0309 15:01:44.221877 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
    Mar 09 15:01:54 kubeadm kubelet[3855]: I0309 15:01:54.221976 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
    ...
    Mar 09 15:10:14 kubeadm kubelet[3855]: I0309 15:10:14.222163 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
    Mar 09 15:10:24 kubeadm kubelet[3855]: I0309 15:10:24.221744 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
    Mar 09 15:10:34 kubeadm kubelet[3855]: I0309 15:10:34.223877 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
    ...
    Mar 09 16:04:14 kubeadm kubelet[3855]: I0309 16:04:14.222853 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
    Mar 09 16:04:24 kubeadm kubelet[3855]: I0309 16:04:24.222531 3855 prober.go:117] Readiness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
    此外,还有 Liveness probe 条目。
    Mar 09 16:12:58 kubeadm kubelet[3855]: I0309 16:12:58.462878    3855 prober.go:117] Liveness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
    Mar 09 16:13:58 kubeadm kubelet[3855]: I0309 16:13:58.462906 3855 prober.go:117] Liveness probe for "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a):liveness" failed (failure): HTTP probe failed with statuscode: 500
    Mar 09 16:14:58 kubeadm kubelet[3855]: I0309 16:14:58.465470 3855 kuberuntime_manager.go:656] Container "liveness" ({"docker" "95567f85708ffac8b34b6c6f2bdb4
    9d8eb57e7704b7b416083c7f296dd40cd0b"}) of pod liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a): Container liveness failed liveness probe, will be restarted
    Mar 09 16:14:58 kubeadm kubelet[3855]: I0309 16:14:58.465587 3855 kuberuntime_manager.go:712] Killing unwanted container "liveness"(id={"docker" "95567f85708ffac8b34b6c6f2bdb49d8eb57e7704b7b416083c7f296dd40cd0b"}) for pod "liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a)"
    测试总时间:
    $ kubectl get po -w
    NAME READY STATUS RESTARTS AGE
    liveness-http 0/1 Running 21 99m
    liveness-http 0/1 CrashLoopBackOff 21 101m
    liveness-http 0/1 Running 22 106m
    liveness-http 1/1 Running 22 106m
    liveness-http 0/1 Running 22 106m
    liveness-http 0/1 Running 23 109m
    liveness-http 1/1 Running 23 109m
    liveness-http 0/1 Running 23 109m
    liveness-http 0/1 CrashLoopBackOff 23 112m
    liveness-http 0/1 Running 24 117m
    liveness-http 1/1 Running 24 117m
    liveness-http 0/1 Running 24 117m
    结论

    Liveness probe check didn't not called anymore

    Liveness check 在 Kubernetes 创建 pod 时创建,并在每次 Pod 重新启动时重新创建。在您的配置中,您已设置 initialDelaySeconds: 20 ,因此在创建 pod 后,Kubernetes 将等待 20 秒,然后它将调用 liveness 探测器 3 次(因为您已设置 failureThreshold: 3 )。 3次失败后,Kubernetes会根据 RestartPolicy重启这个pod。同样在日志中,您将能够在日志中找到:
    Mar 09 16:14:58 kubeadm kubelet[3855]: I0309 16:14:58.465470    3855 kuberuntime_manager.go:656] Container "liveness" ({"docker" "95567f85708ffac8b34b6c6f2bdb4
    9d8eb57e7704b7b416083c7f296dd40cd0b"}) of pod liveness-http_default(8c87a08e-34aa-4bb1-be9b-fdca39a4562a): Container liveness failed liveness probe, will be restarted
    为什么会重启?答案可以在 Container probes 中找到。

    livenessProbe: Indicates whether the container is running. If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy.

    GKE 中的默认 Restart PolicyAlways 。因此,您的 pod 将一遍又一遍地重新启动。

    Readiness probe check called but interval became long ( maximum interval looks about 10 minutes)


    我认为您已经根据 $ kubectl get events$ kubectl describe po 得出了这个结论。在这两种情况下,默认事件都会在 1 小时后删除。在我的 Tests 部分,您可以看到 Readiness probe 条目从 14:48:3416:04:24 ,因此 Kubernetes 每 10 秒调用一次 Readiness Probe

    Why Liveness probe didn't run, and interval of Readiness probe changed?


    正如我在 Tests 部分向您展示的那样, Readiness probe 没有改变。在这种情况下误导是使用 $ kubectl events 。关于 Liveiness Probe 它仍在调用,但在 pod 之后只有 3 次是 created/ restarted 。我还包括了 $ kubectl get po -w 的输出。重新创建 pod 时,您可能会在 kubelet 日志中找到那些 liveness probes

    In my plan, I would create alert policy, whose condition is like:

    • if liveness probe error happens 3 times in 3 minutes

    如果 liveness probe 失败 3 次,则使用您当前的设置,它将重新启动此 pod。在这种情况下,您可以使用每个 restart 来创建一个 alert
    Metric: kubernetes.io/container/restart_count
    Resource type: k8s_container
    您可以在 Stackoverflow 案例中找到一些关于 Monitoring alert 的有用信息,例如:
  • Monitoring and alerting on pod status or restart with Google Container Engine (GKE) and Stackdriver
  • Can I use Google Cloud Monitoring to monitor a failing Container / Pod?
  • 关于google-kubernetes-engine - Liveness Probe、Readiness Probe 未在预期持续时间内调用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66299478/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com