kubernetes - 使用 Autoscaler 在 GCP 上进行不可调度的 Kubernetes pod-6ren

kubernetes - 使用 Autoscaler 在 GCP 上进行不可调度的 Kubernetes pod

转载作者：行者123 更新时间：2023-12-05 05:52:26

25

4

我有一个 Kubernetes 集群，其中包含使用 Autopilot 自动扩展的 pod。突然他们停止自动缩放，我是 Kubernetes 的新手，我不知道该做什么或应该在控制台中显示什么以寻求帮助。

pod 自动处于 Unschedulable 状态，在集群内部将其状态置于 Pending 而不是 running 并且不允许我进入或交互。

此外，我无法在 GCP Console 中删除或停止它们。没有内存或 CPU 不足的问题，因为没有多少服务器在上面运行。

在我遇到这个问题之前，集群按预期工作。

Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=odoo-service
                pod-template-hash=5bd88899d7
Annotations:    seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/odoo-cluster-dev-5bd88899d7
Containers:
  odoo-service:
    Image:      us-central1-docker.pkg.dev/adams-dev/adams-odoo/odoo-service:v58
    Port:       <none>
    Host Port:  <none>
    Limits:
      cpu:                2
      ephemeral-storage:  1Gi
      memory:             8Gi
    Requests:
      cpu:                2
      ephemeral-storage:  1Gi
      memory:             8Gi
    Environment:
      ODOO_HTTP_SOCKET_TIMEOUT:  30
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zqh5r (ro)
  cloud-sql-proxy:
    Image:      gcr.io/cloudsql-docker/gce-proxy:1.17
    Port:       <none>
    Host Port:  <none>
    Command:
      /cloud_sql_proxy
      -instances=adams-dev:us-central1:odoo-test=tcp:5432
    Limits:
      cpu:                1
      ephemeral-storage:  1Gi
      memory:             2Gi
    Requests:
      cpu:                1
      ephemeral-storage:  1Gi
      memory:             2Gi
    Environment:          <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zqh5r (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  kube-api-access-zqh5r:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason             Age                     From                                   Message
  ----     ------             ----                    ----                                   -------
  Normal   NotTriggerScaleUp  28m (x248 over 3h53m)   cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 4 in backoff after failed scale-up, 2 Insufficient cpu, 2 Insufficient memory
  Normal   NotTriggerScaleUp  8m1s (x261 over 3h55m)  cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient memory, 4 in backoff after failed scale-up, 2 Insufficient cpu
  Normal   NotTriggerScaleUp  3m (x1646 over 3h56m)   cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 2 Insufficient memory, 4 in backoff after failed scale-up
  Warning  FailedScheduling   20s (x168 over 3h56m)   gke.io/optimize-utilization-scheduler  0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory.


Events:
  Type     Reason             Age                      From                                   Message
  ----     ------             ----                     ----                                   -------
  Normal   NotTriggerScaleUp  28m (x250 over 3h56m)    cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient memory, 4 in backoff after failed scale-up, 2 Insufficient cpu
  Normal   NotTriggerScaleUp  8m2s (x300 over 3h55m)   cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 4 in backoff after failed scale-up, 2 Insufficient cpu, 2 Insufficient memory
  Warning  FailedScheduling   5m21s (x164 over 3h56m)  gke.io/optimize-utilization-scheduler  0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory.
  Normal   NotTriggerScaleUp  3m1s (x1616 over 3h55m)  cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 Insufficient cpu, 2 Insufficient memory, 4 in backoff after failed scale-up

我不知道我可以调试或修复它多少。

最佳答案

Pod 无法在任何节点上调度，因为所有节点都没有可用的 cpu。

集群自动缩放器尝试扩展，但在扩展尝试失败后退出，这表明扩展作为节点池一部分的托管实例组可能存在问题。

集群自动缩放器尝试扩展，但由于达到配额限制，无法添加新节点。

您看不到计入配额的 Autopilot GKE 虚拟机。

尝试在另一个区域创建自动驾驶集群。如果自动驾驶仪集群不再满足您的需求，请选择标准集群。

关于kubernetes - 使用 Autoscaler 在 GCP 上进行不可调度的 Kubernetes pod，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/70139877/

25

4

0

文章推荐： scala - reduceByKey 以 case 类实例为键

文章推荐： Python - 用空字典/数组替换空值

python - 不可 JSON 序列化的项目的管道
我正在尝试将抓取的 xml 输出写入 json。由于项目不可序列化，抓取失败。从这个问题来看，它建议您需要构建一个管道，未提供的答案超出了问题 SO scrapy serializer 的范围。所
c++ - 参数的重载函数(不可)在编译时推导
有没有一种方法可以通过重载函数来区分参数是在编译时可评估还是仅在运行时可评估？假设我有以下功能: std::string lookup(int x) { return table::va
wpf - 为什么 CompositeCollection 不可 Freezable？
我正在使用 MVVM 模式编写一个应用程序。我通过将 View 的 DataContext 属性设置为 ViewModel 的实例来向 View 提供数据。一般来说，我只是从那里使用 Binding
python - 字节串怎样才能很好的存储呢？ - 不可 JSON 序列化 -
对于一个项目，我正在使用带有简单 python module 的传感器收集多个红外命令。 . 我收到如下字节字符串: commando1= b'7g4770CQfwCTVT9bQDAzVEBMagGR
python - Decimal 不可 JSON 序列化
我有一个计算方法，可以在用户使用 Cartridge 作为我的商店框架结账时计算税费。税 = 税 * 小数(str(settings.SHOP_DEFAULT_TAX_RATE)) 计算工作正常。然
python - pickle 不可 pickle 的对象
我正在用 pygame 制作一个绘图程序，我想在其中为用户提供一个选项来保存程序的确切状态，然后在稍后重新加载它。在这一点上，我保存了我的全局字典的副本，然后遍历， pickle 每个对象。 pyga
c++ - 使类不可复制*和*不可 move
在 C++11 之前，我可以使用它来使类不可复制: private: MyClass(const MyClass&); MyClass& operator=(const MyClass&); 使用 C
c++ - LinearHashTable iter 不可取消引用且 iter 不可
大家好 :) 我在我的 VC++ 项目中使用 1.5.4-all (2014-10-22)(适用于 x86 平台的 Microsoft Visual C++ 编译器 18.00.21005.1)。我
python - TypeError : array([ 0.]) 不可 JSON 序列化
我有一个 python 文件:analysis.py: def svm_analyze_AHE(file_name): # obtain abp file testdata = pd.
python - 不可 JSON 序列化 - Python + Flask + Sqlalchemy
这个问题已经有答案了: How to serialize SqlAlchemy result to JSON? (37 个回答) 已关闭 4 年前。我正在编写小查询来从 mysql 获取数据数据库，
python - TypeError : {1, 3} 不可 JSON 序列化
我是 Python 初学者，我在 JSON 方面遇到了一些问题。在我正在使用的教程中有两个函数: def read_json(filename): data = [] if os.pa
javascript - 如何使 HTML 元素在 iPad 上**不可**旋转？
我目前正在开发一个针对 iPad 的基于 HTML5 Canvas/JavaScript 的小型绘图应用程序。它在 Safari 中运行。到目前为止，除了一件事之外，一切都进展顺利。如果我旋转设备，
c++ - 不可 move -不可复制对象的 vector 的 move 分配不编译
以下代码无法使用 Visual Studio 2013 编译: #include struct X { X() = default; X(const X&) = delete;
python - TypeError : array( ['cycling' ], dtype=object) 不可 JSON 序列化
嗨，我制作了一个文本分类分类器，我在其中使用了它，它返回一个数组，我想返回 jsonresponse，但最后一行代码给我错误 'array(['cycling'], dtype =object) 不可
python - TypeError : sqlalchemy. orm.attributes.InstrumentedAttribute 对象位于 0x7f86789f9bf8 不可 JSON 序列化
我使用 Flask 和 Flask-Login 进行用户身份验证。 Flask-Sqlalchemy 将这些模型存储在 sqlite 数据库中: ROLE_USER = 0 ROLE_ADMIN =
python - 如何将 Python 对象(不可 JSON 序列化)从一个 (AWS) lambda 函数传递到另一个？
如果您尝试发送不可 JSON 序列化的对象(列表、字典、整数等以外的任何对象)，您会收到以下错误消息: "errorMessage": "Object of type set is not JSON
c++ - 如果 T 不可 move ，则 std::vector 是否可 move ？
我在尝试 move std::vector 时遇到崩溃其中 T显然是不可 move 的(没有定义 move 构造函数/赋值运算符，它包含内部指针) 但为什么 vector 的 move 函数要调用 T
python - jwt.encode 失败，显示 "Object of type ' 字节“不可 JSON 序列化”
我尝试在用户成功登录后将 token 返回给他们，但不断收到以下错误: 类型错误:“字节”类型的对象不可 JSON 序列化我该如何解决这个问题？这是我到目前为止的代码: if user:

首页

博学

6Ren·AI

商城

kubernetes - 使用 Autoscaler 在 GCP 上进行不可调度的 Kubernetes pod