gpt4 book ai didi

google-kubernetes-engine - 具有默认类 PV 的 Pod 需要 30 分钟来升级等待磁盘附件

转载 作者:行者123 更新时间:2023-12-04 21:28:49 25 4
gpt4 key购买 nike

我部署了一个带有 1 个 pod 和 2 个容器的 Helm chart (statefulSet),其中一个容器附加了 PV(readwriteonce)。升级时,需要 30 分钟(7 次失败尝试)才能再次启动(因此服务关闭了 30 分钟)

一些背景:

  • PV 使用默认 GKE 类
  • 是一个 GKE 区域,每个区域中有一个节点
  • 即使没有强制执行,pod 也会在同一个节点中再次启动(所以不是我可以看到的节点传输)
  • 我在 azure AKS 中遇到了类似的问题,它也失败了 7 次,但速度要快得多,因此停机时间最少,并且涉及节点转移

  • yaml文件的相关部分:
    volumeMounts:
    - mountPath: /app/data
    name: prod-data
      volumeClaimTemplates:
    - metadata:
    creationTimestamp: null
    name: prod-data
    spec:
    accessModes:
    - ReadWriteOnce
    resources:
    requests:
    storage: 500Gi
    storageClassName: standard
    volumeMode: Filesystem

    错误消息:
    Unable to mount volumes for pod "foo" timeout expired waiting for volumes to attach or mount for pod "foo". list of unmounted volumes=[foo] list of unattached volumes [foo default-token-foo]

    附加上下文,这是触发 StatefulSet 升级后发生的情况:

    什么都没有改变
    Name:          prod-data-prod-0
    Namespace: prod
    StorageClass: standard
    Status: Bound
    Volume: pvc-16f49d12-f644-11e9-952a-4201ac100008
    Labels: app=prod
    release=prod
    Annotations: pv.kubernetes.io/bind-completed: yes
    pv.kubernetes.io/bound-by-controller: yes
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/gce-pd
    Finalizers: [kubernetes.io/pvc-protection]
    Capacity: 500Gi
    Access Modes: RWO
    VolumeMode: Filesystem
    Mounted By: prod-0
    Events: <none>

    然后第一个错误
    Unable to mount volumes for pod "prod-0_prod(89fb0cf5-0008-11ea-b349-4201ac100009)": timeout expired waiting for volumes to attach or mount for pod "prod"/"prod-0". list of unmounted volumes=[prod-data]. list of unattached volumes=[prod-data default-token-4624v]

    还是一样的描述
    Name:          prod-data-prod-0
    Namespace: prod
    StorageClass: standard
    Status: Bound
    Volume: pvc-16f49d12-f644-11e9-952a-4201ac100008
    Labels: app=prod
    release=prod
    Annotations: pv.kubernetes.io/bind-completed: yes
    pv.kubernetes.io/bound-by-controller: yes
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/gce-pd
    Finalizers: [kubernetes.io/pvc-protection]
    Capacity: 500Gi
    Access Modes: RWO
    VolumeMode: Filesystem
    Mounted By: prod-0
    Events: <none>

    在第二次挂载失败后,这是 pod 描述
    Conditions:
    Type Status
    Initialized False
    Ready False
    ContainersReady False
    PodScheduled True
    Volumes:
    vlapi-prod-data:
    Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName: prod-data-prod-0
    ReadOnly: false
    default-token-4624v:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-4624v
    Optional: false
    QoS Class: Burstable
    Node-Selectors: <none>
    Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
    node.kubernetes.io/unreachable:NoExecute for 300s

    FailedMount nr 3
    PVC 描述没有变化
    pod 描述的事件
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Normal Scheduled 8m44s default-scheduler Successfully assigned prod/prod-0 to gke-vlgke-a-default-pool-312c60b0-p8lb
    Warning FailedMount 2m8s (x3 over 6m41s) kubelet, gke-vlgke-a-default-pool-312c60b0-p8lb Unable to mount volumes for pod "prod-0_prod(89fb0cf5-0008-11ea-b349-4201ac100009)": timeout expired waiting for volumes to attach or mount for pod "prod"/"prod-0". list of unmounted volumes=[prod-data]. list of unattached volumes=[prod-data default-token-4624v]

    警告 FailedMount 48s (x4 over 7m38s)
    警告 FailedMount 13s (x5 over 9m17s)
    Name:              pvc-16f49d12-f644-11e9-952a-4201ac100008
    Labels: failure-domain.beta.kubernetes.io/region=europe-west1
    failure-domain.beta.kubernetes.io/zone=europe-west1-d
    Annotations: kubernetes.io/createdby: gce-pd-dynamic-provisioner
    pv.kubernetes.io/bound-by-controller: yes
    pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
    Finalizers: [kubernetes.io/pv-protection]
    StorageClass: standard
    Status: Bound
    Claim: prod/prod-data-prod-0
    Reclaim Policy: Retain
    Access Modes: RWO
    VolumeMode: Filesystem
    Capacity: 500Gi
    Node Affinity:
    Required Terms:
    Term 0: failure-domain.beta.kubernetes.io/zone in [europe-west1-d]
    failure-domain.beta.kubernetes.io/region in [europe-west1]
    Message:
    Source:
    Type: GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName: gke-vlgke-a-0d42343f-d-pvc-16f49d12-f644-11e9-952a-4201ac100008
    FSType: ext4
    Partition: 0
    ReadOnly: false

    FailedMount 47s (x6 over 12m)
    FailedMount 11s (x7 over 13m)
    FailedMount 33s (x8 over 16m)
    FailedMount 9s (x9 over 18m)
    FailedMount 0s (x10 over 20m)
    FailedMount 超时之间约 2m
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Normal Scheduled 24m default-scheduler Successfully assigned prod/prod-0 to gke-vlgke-a-default-pool-312c60b0-p8lb
    Warning FailedMount 2m4s (x10 over 22m) kubelet, gke-vlgke-a-default-pool-312c60b0-p8lb Unable to mount volumes for pod "prod-0_prod(89fb0cf5-0008-11ea-b349-4201ac100009)": timeout expired waiting for volumes to attach or mount for pod "prod"/"prod-0". list of unmounted volumes=[prod-data]. list of unattached volumes=[prod-data default-token-4624v]
    Normal Pulling 11s kubelet, gke-gke-default-pool-312c60b0-p8lb Pulling image "gcr.io/foo-251818/`foo:2019-11-05"

    第 11 次尝试安装工作
    没有变化我可以理解 PVC 描述

    最佳答案

    一种可能性是您的 pod 的 spec.securityContext.runAsUser 和 spec.securityContext.fsGroup 不同于 0(非 root),并且 k8s 会尝试更改卷上所有文件的文件访问权限,这需要一些时间。
    尝试在您的 pod 定义中将它们设置为

    spec:
    securityContext:
    runAsUser: 0
    fsGroup: 0

    其他可能性可能包括 PVC 和 PV 之间的属性(访问模式、容量)不匹配。此外,如果您定义了一个这种类型的 PV,则使用 RWO PVC 提升多个 pod 可能会产生争用。

    关于google-kubernetes-engine - 具有默认类 PV 的 Pod 需要 30 分钟来升级等待磁盘附件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58711103/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com