gpt4 book ai didi

kubernetes - Kubectl推出重新启动以实现有状态集

转载 作者:行者123 更新时间:2023-12-02 11:29:22 24 4
gpt4 key购买 nike

根据kubectl docskubectl rollout restart适用于部署,守护程序和状态集。它可以按预期进行部署。但是对于有状态集,它仅重新启动2个Pod中的一个Pod。

✗ k rollout restart statefulset alertmanager-main                       (playground-fdp/monitoring)
statefulset.apps/alertmanager-main restarted

✗ k rollout status statefulset alertmanager-main (playground-fdp/monitoring)
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
statefulset rolling update complete 2 pods at revision alertmanager-main-59d7ccf598...

✗ kgp -l app=alertmanager (playground-fdp/monitoring)
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 21h
alertmanager-main-1 2/2 Running 0 20s

如您所见,pod alertmanager-main-1已重新启动,其使用期限为20s。而有状态警报管理器中的另一个Pod,即pod alertmanager-main-0尚未重新启动,它的年龄是21h。知道如何在状态映射集使用的某些configmap更新后如何重新启动它吗?

[更新1]这是statefulset配置。如您所见,没有设置 .spec.updateStrategy.rollingUpdate.partition
apiVersion: apps/v1
kind: StatefulSet
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"Alertmanager","metadata":{"annotations":{},"labels":{"alertmanager":"main"},"name":"main","namespace":"monitoring"},"spec":{"baseImage":"10.47.2.76:80/alm/alertmanager","nodeSelector":{"kubernetes.io/os":"linux"},"replicas":2,"securityContext":{"fsGroup":2000,"runAsNonRoot":true,"runAsUser":1000},"serviceAccountName":"alertmanager-main","version":"v0.19.0"}}
creationTimestamp: "2019-12-02T07:17:49Z"
generation: 4
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
ownerReferences:
- apiVersion: monitoring.coreos.com/v1
blockOwnerDeletion: true
controller: true
kind: Alertmanager
name: main
uid: 3e3bd062-6077-468e-ac51-909b0bce1c32
resourceVersion: "521307"
selfLink: /apis/apps/v1/namespaces/monitoring/statefulsets/alertmanager-main
uid: ed4765bf-395f-4d91-8ec0-4ae23c812a42
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
alertmanager: main
app: alertmanager
serviceName: alertmanager-operated
template:
metadata:
creationTimestamp: null
labels:
alertmanager: main
app: alertmanager
spec:
containers:
- args:
- --config.file=/etc/alertmanager/config/alertmanager.yaml
- --cluster.listen-address=[$(POD_IP)]:9094
- --storage.path=/alertmanager
- --data.retention=120h
- --web.listen-address=:9093
- --web.external-url=http://10.47.0.234
- --web.route-prefix=/
- --cluster.peer=alertmanager-main-0.alertmanager-operated.monitoring.svc:9094
- --cluster.peer=alertmanager-main-1.alertmanager-operated.monitoring.svc:9094
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: 10.47.2.76:80/alm/alertmanager:v0.19.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 10
httpGet:
path: /-/healthy
port: web
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
name: alertmanager
ports:
- containerPort: 9093
name: web
protocol: TCP
- containerPort: 9094
name: mesh-tcp
protocol: TCP
- containerPort: 9094
name: mesh-udp
protocol: UDP
readinessProbe:
failureThreshold: 10
httpGet:
path: /-/ready
port: web
scheme: HTTP
initialDelaySeconds: 3
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
resources:
requests:
memory: 200Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/alertmanager/config
name: config-volume
- mountPath: /alertmanager
name: alertmanager-main-db
- args:
- -webhook-url=http://localhost:9093/-/reload
- -volume-dir=/etc/alertmanager/config
image: 10.47.2.76:80/alm/configmap-reload:v0.0.1
imagePullPolicy: IfNotPresent
name: config-reloader
resources:
limits:
cpu: 100m
memory: 25Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/alertmanager/config
name: config-volume
readOnly: true
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccount: alertmanager-main
serviceAccountName: alertmanager-main
terminationGracePeriodSeconds: 120
volumes:
- name: config-volume
secret:
defaultMode: 420
secretName: alertmanager-main
- emptyDir: {}
name: alertmanager-main-db
updateStrategy:
type: RollingUpdate
status:
collisionCount: 0
currentReplicas: 2
currentRevision: alertmanager-main-59d7ccf598
observedGeneration: 4
readyReplicas: 2
replicas: 2
updateRevision: alertmanager-main-59d7ccf598
updatedReplicas: 2

最佳答案

您没有提供整个方案。它可能取决于Readiness ProbeUpdate Strategy
StatefulSet从索引0 to n-1重新启动pod。可以在here中找到详细信息。

原因1 *
Statefulset有4个update strategies

  • 删除
  • 滚动更新
  • 分区
  • 强制回滚

  • Partition更新中,您可以找到以下信息:

    If a partition is specified, all Pods with an ordinal that is greater than or equal to the partition will be updated when the StatefulSet’s .spec.template is updated. All Pods with an ordinal that is less than the partition will not be updated, and, even if they are deleted, they will be recreated at the previous version. If a StatefulSet’s .spec.updateStrategy.rollingUpdate.partition is greater than its .spec.replicas, updates to its .spec.template will not be propagated to its Pods. In most cases you will not need to use a partition, but they are useful if you want to stage an update, roll out a canary, or perform a phased roll out.



    因此,如果您在 StatefulSet中的某处设置了 updateStrategy.rollingUpdate.partition: 1,它将重新启动索引为1或更高的所有Pod。
    partition: 3的示例
    NAME    READY   STATUS    RESTARTS   AGE
    web-0 1/1 Running 0 30m
    web-1 1/1 Running 0 30m
    web-2 1/1 Running 0 31m
    web-3 1/1 Running 0 2m45s
    web-4 1/1 Running 0 3m
    web-5 1/1 Running 0 3m13s

    原因2
    Readiness probe的配置。

    如果 initialDelaySecondsperiodSeconds的值较高,则可能需要一段时间才能重新启动另一个。有关这些参数的详细信息,请参见 here

    在下面的示例中,pod将等待10秒钟,它将运行,而 readiness probe每2秒钟检查一次。取决于值,这可能是此行为的原因。
        readinessProbe:
    failureThreshold: 3
    httpGet:
    path: /
    port: 80
    scheme: HTTP
    initialDelaySeconds: 10
    periodSeconds: 2
    successThreshold: 1
    timeoutSeconds: 1

    原因3

    我看到每个 pods 中有2个容器。
    NAME                  READY   STATUS    RESTARTS   AGE
    alertmanager-main-0 2/2 Running 0 21h
    alertmanager-main-1 2/2 Running 0 20s

    docs中所述:

    Running - The Pod has been bound to a node, and all of the Containers have been created. At least one Container is still running, or is in the process of starting or restarting.



    最好用 containers(readinessProbe/livenessProbe,重新启动等)检查一切是否正常。

    关于kubernetes - Kubectl推出重新启动以实现有状态集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59168406/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com