gpt4 book ai didi

Non-disruptive way of resetting the pod restart count in Kubernetes(以无中断的方式在Kubernetes中重置Pod重启计数)

转载 作者:bug小助手 更新时间:2023-10-28 11:08:07 26 4
gpt4 key购买 nike



Currently our monitoring is designed in a such a way that it alerts if any pod restarts more than 50 times.

目前,我们的监控程序是这样设计的,如果任何Pod重启超过50次,它就会发出警报。


This is an example alert we get

这是我们收到的示例警报


summary = More than 50 restarts in pod xxx on cluster xxx


In some situations because of planned maintenance activities the specific application pods get restarted and the restart count goes more than 50 and subsequently we get alerted.

在某些情况下,由于计划的维护活动,特定的应用程序Pod会重新启动,并且重新启动计数超过50,随后我们会收到警报。


This alert will be active until the count resets back to 0 again.

此警报将处于活动状态,直到计数再次重置为0。


So for non-prod environments we delete that pod (with more than 50 restarts) , then the deployment creates a new one and automatically the restart count (for new pod) comes to 0 and we are all happy.

因此,对于非生产环境,我们删除该Pod(重新启动50次以上),然后部署创建一个新Pod,并且重新启动计数(对于新Pod)自动变为0,我们都很满意。


But we dont have that leverage to do the same destructive operation of deleting a pod in production. And we if dont do it, the restart count remains as more than 50 always , alert keeps on coming. And there is also a good chance we loose a genuine alert in between this.

但我们没有那个筹码来做同样的破坏性操作,在生产中删除一个豆荚。如果我们不这样做,重启次数始终保持在50以上,警报不断到来。在此期间,我们也很有可能失去真正的警报。


How can we overcome this. I assume this should be the same problem everyone faces in the k8s world.

我们如何才能克服这一点。我想这应该是每个K8世界里的人都面临的问题。


This is the prometheus metric we use to track the restart count

这是我们用来跟踪重新启动计数的普罗米修斯指标


kube_pod_container_status_restarts_total > 50

KUBE_POD_CONTAINER_STATUS_RESTARTS_TOTAL>50


Tried looking out for k8s documentation to reset the pod counter directly from the k8s etc database, but that doest seem like a recommended approach.

我试着寻找K8s文档,直接从K8s ETC数据库重置Pod计数器,但这似乎不是推荐的方法。


How can we overcome this. What is the best possible approach.

我们如何才能克服这一点。什么是最好的可能方法。


更多回答

Why do you consider deleting a pod "destructive"? Normally its controlling Deployment or StatefulSet will recreate it immediately. I might suggest tooling like kubectl rollout restart deployment ... which will delete and recreate all of a Deployment's Pods in one fell swoop.

为什么你认为删除一个Pod是“破坏性的”?通常,它的控制部署或状态集会立即重新创建它。我可能会建议像kubectl推出重启部署这样的工具...这将一举删除并重新创建部署的所有Pod。

@DavidMaze as we all know , any change in production that impacts the app availability goes through lot of approvals and scrutiny. In this case if we are trying to do a rollout restart - thats technically a new deployment in production and Which practically makes things im-possible.

@DavidMaze众所周知,任何影响应用可用性的生产变化都要经过大量的批准和审查。在这种情况下,如果我们试图重新启动-从技术上讲,这是在生产中的新部署,这实际上使事情变得不可能。

Assuming you have more than one replica (you do, right?) neither deleting individual Pods nor kubectl rollout restart should significantly affect availability. The Service will continue to route requests to the Pods that are still running. The Deployment-restart sequence will only recreate some of the Pods at a time, waiting to destroy old ones until new ones pass their liveness and readiness probes.

假设您有多个复制品(您有,对吗?)删除单个Pod或重新启动kubectl都不会对可用性产生重大影响。该服务将继续将请求路由到仍在运行的Pod。部署-重启序列一次只会重新创建一些吊舱,等待摧毁旧的吊舱,直到新的吊舱通过其活性和就绪探测。

Keep in mind that in Kubernetes pod restart is equal to pod delete. Pods are supposed to be mortal and temporary.

请记住,在Kubernetes中,实例重新启动等同于实例删除。豆荚应该是凡人和暂时的。

优秀答案推荐

This can only be accomplish by restarting the pod.

这只能通过重新启动Pod来完成。


Also, a feature related to this has been rejected.

此外,与此相关的一项功能也被拒绝了。


https://github.com/kubernetes/kubernetes/issues/50375

https://github.com/kubernetes/kubernetes/issues/50375


更多回答

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com