gpt4 book ai didi

Kubernetes CronJob 停止调度作业

转载 作者:行者123 更新时间:2023-12-04 17:29:40 34 4
gpt4 key购买 nike

不确定我做错了什么,但我遇到了 CronJobs 停止安排新作业的问题。似乎只有在启动新作业失败几次之后才会发生这种情况。在我的特定情况下,由于无法拉取容器镜像,Jobs 无法启动。

我并没有真正找到任何会导致这种情况的设置,但我不是 Kubernetes CronJobs 方面的专家。配置如下:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
labels:
app.kubernetes.io/instance: cron-deal-report
app.kubernetes.io/managed-by: Tiller
app.kubernetes.io/name: cron
helm.sh/chart: cron-0.1.0
name: cron-deal-report
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
spec:
containers:
- args:
- -c
- npm run script
command:
- /bin/sh
env:
image: <redacted>
imagePullPolicy: Always
name: cron
resources: {}
securityContext:
runAsUser: 1000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: 0/15 * * * *
successfulJobsHistoryLimit: 3
suspend: false
status: {}

最佳答案

kubernetes 作业如何处理故障

根据 Jobs - Run to Completion - Handling Pod and Container Failures :

An entire Pod can also fail, for a number of reasons, such as when the pod is kicked off the node (node is upgraded, rebooted, deleted, etc.), or if a container of the Pod fails and the .spec.template.spec.restartPolicy = "Never". When a Pod fails, then the Job controller starts a new Pod.



您正在使用 restartPolicy: Never为您 jobTemplate ,因此,请参阅 Pod backoff failure policy 上的下一个报价:

There are situations where you want to fail a Job after some amount of retries due to a logical error in configuration etc. To do so, set .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6. The back-off count is reset if no new failed Pods appear before the Job’s next status check.


.spec.backoffLimit未在您的 jobTemplate 中定义,所以它使用默认值( 6 )。

以下,根据 Job Termination and Cleanup :

By default, a Job will run uninterrupted unless a Pod fails, at which point the Job defers to the .spec.backoffLimit described above. Another way to terminate a Job is by setting an active deadline. Do this by setting the .spec.activeDeadlineSeconds field of the Job to a number of seconds.



这就是你的情况: 如果您的容器连续 6 次未能拉取镜像,您的作业将被视为失败。

定时任务

根据 Cron Job Limitations :

A cron job creates a job object about once per execution time of its schedule [...]. The Cronjob is only responsible for creating Jobs that match its schedule, and the Job in turn is responsible for the management of the Pods it represents.



这意味着所有 pod/容器故障都应该由作业 Controller 处理(即调整 jobTemplate)。

“重试”作业:

您不需要重新创建 Cronjob,以防它的 Job of 失败。您只需要等待下一个时间表。

如果你想在下一个计划之前运行一个新的 Job,你可以使用 Cronjob 模板手动创建一个 Job:
kubectl create job --from=cronjob/my-cronjob-name my-manually-job-name

你应该做什么:

如果您的容器无法持续下载镜像,您有以下选择:
  • 显式设置和调整 backoffLimit到更高的值。
  • 使用 restartPolicy: OnFailure对于您的容器,因此 Pod 将保留在节点上,并且只会重新运行容器。
  • 考虑使用 imagePullPolicy: IfNotPresent .如果您不重新标记图像,则无需在每次作业开始时强制重新拉动。
  • 关于Kubernetes CronJob 停止调度作业,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55820054/

    34 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com