gpt4 book ai didi

kubernetes - 使用 Autoscaler 在 GCP 上进行不可调度的 Kubernetes pod

转载 作者:行者123 更新时间:2023-12-05 05:52:26 25 4
gpt4 key购买 nike

我有一个 Kubernetes 集群,其中包含使用 Autopilot 自动扩展的 pod。突然他们停止自动缩放,我是 Kubernetes 的新手,我不知道该做什么或应该在控制台中显示什么以寻求帮助。

pod 自动处于 Unschedulable 状态,在集群内部将其状态置于 Pending 而不是 running 并且不允许我进入或交互。

此外,我无法在 GCP Console 中删除或停止它们。没有内存或 CPU 不足的问题,因为没有多少服务器在上面运行。

在我遇到这个问题之前,集群按预期工作。

Namespace:      default
Priority: 0
Node: <none>
Labels: app=odoo-service
pod-template-hash=5bd88899d7
Annotations: seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/odoo-cluster-dev-5bd88899d7
Containers:
odoo-service:
Image: us-central1-docker.pkg.dev/adams-dev/adams-odoo/odoo-service:v58
Port: <none>
Host Port: <none>
Limits:
cpu: 2
ephemeral-storage: 1Gi
memory: 8Gi
Requests:
cpu: 2
ephemeral-storage: 1Gi
memory: 8Gi
Environment:
ODOO_HTTP_SOCKET_TIMEOUT: 30
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zqh5r (ro)
cloud-sql-proxy:
Image: gcr.io/cloudsql-docker/gce-proxy:1.17
Port: <none>
Host Port: <none>
Command:
/cloud_sql_proxy
-instances=adams-dev:us-central1:odoo-test=tcp:5432
Limits:
cpu: 1
ephemeral-storage: 1Gi
memory: 2Gi
Requests:
cpu: 1
ephemeral-storage: 1Gi
memory: 2Gi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zqh5r (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-zqh5r:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NotTriggerScaleUp 28m (x248 over 3h53m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 4 in backoff after failed scale-up, 2 Insufficient cpu, 2 Insufficient memory
Normal NotTriggerScaleUp 8m1s (x261 over 3h55m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient memory, 4 in backoff after failed scale-up, 2 Insufficient cpu
Normal NotTriggerScaleUp 3m (x1646 over 3h56m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 2 Insufficient memory, 4 in backoff after failed scale-up
Warning FailedScheduling 20s (x168 over 3h56m) gke.io/optimize-utilization-scheduler 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory.


Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NotTriggerScaleUp 28m (x250 over 3h56m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient memory, 4 in backoff after failed scale-up, 2 Insufficient cpu
Normal NotTriggerScaleUp 8m2s (x300 over 3h55m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 4 in backoff after failed scale-up, 2 Insufficient cpu, 2 Insufficient memory
Warning FailedScheduling 5m21s (x164 over 3h56m) gke.io/optimize-utilization-scheduler 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory.
Normal NotTriggerScaleUp 3m1s (x1616 over 3h55m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 2 Insufficient memory, 4 in backoff after failed scale-up

我不知道我可以调试或修复它多少。

最佳答案

Pod 无法在任何节点上调度,因为所有节点都没有可用的 cpu。

集群自动缩放器尝试扩展,但在扩展尝试失败后退出,这表明扩展作为节点池一部分的托管实例组可能存在问题。

集群自动缩放器尝试扩展,但由于达到配额限制,无法添加新节点。

您看不到计入配额的 Autopilot GKE 虚拟机。

尝试在另一个区域创建自动驾驶集群。如果自动驾驶仪集群不再满足您的需求,请选择标准集群。

关于kubernetes - 使用 Autoscaler 在 GCP 上进行不可调度的 Kubernetes pod,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70139877/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com