memory - 我的机器有足够的内存，但是kubernetes无法调度pod并指示内存不足-6ren

memory - 我的机器有足够的内存，但是kubernetes无法调度pod并指示内存不足

转载作者：行者123 更新时间：2023-12-02 12:28:09

我有一个1.16.2版本的kubernetes集群。当我在副本为1的群集中部署所有服务时，它可以正常工作。然后，我将所有服务的副本都缩放到2并 checkout 。发现某些服务运行正常，但某些状态处于挂起状态。
当我kubectl描述一个Pending Pane 时，我收到如下消息

[root@runsdata-bj-01 society-training-service-v1-0]# kcd society-resident-service-v3-0-788446c49b-rzjsx
Name:           society-resident-service-v3-0-788446c49b-rzjsx
Namespace:      runsdata
Priority:       0
Node:           <none>
Labels:         app=society-resident-service-v3-0
                pod-template-hash=788446c49b
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/society-resident-service-v3-0-788446c49b
Containers:
  society-resident-service-v3-0:
    Image:      docker.ssiid.com/society-resident-service:3.0.33
    Port:       8231/TCP
    Host Port:  0/TCP
    Limits:
      cpu:     1
      memory:  4Gi
    Requests:
      cpu:      200m
      memory:   2Gi
    Liveness:   http-get http://:8231/actuator/health delay=600s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:8231/actuator/health delay=30s timeout=5s period=10s #success=1 #failure=3
    Environment:
      spring_profiles_active:  production
      TZ:                      Asia/Hong_Kong
      JAVA_OPTS:               -Djgroups.use.jdk_logger=true -Xmx4000M  -Xms4000M  -Xmn600M  -XX:PermSize=500M  -XX:MaxPermSize=500M  -Xss384K  -XX:+DisableExplicitGC  -XX:SurvivorRatio=1 -XX:+UseConcMarkSweepGC  -XX:+UseParNewGC  -XX:+CMSParallelRemarkEnabled  -XX:+UseCMSCompactAtFullCollection  -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSClassUnloadingEnabled  -XX:LargePageSizeInBytes=128M  -XX:+UseFastAccessorMethods  -XX:+UseCMSInitiatingOccupancyOnly  -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+PrintClassHistogram  -XX:+PrintGCDetails  -XX:+PrintGCTimeStamps  -XX:+PrintHeapAtGC  -Xloggc:log/gc.log
    Mounts:
      /data/storage from nfs-data-storage (rw)
      /opt/security from security (rw)
      /var/log/runsdata from log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from application-token-vgcvb (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  log:
    Type:          HostPath (bare host directory volume)
    Path:          /log/runsdata
    HostPathType:  
  security:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-security-claim
    ReadOnly:   false
  nfs-data-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-storage-claim
    ReadOnly:   false
  application-token-vgcvb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  application-token-vgcvb
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/4 nodes are available: 4 Insufficient memory.

从下面可以看到我的机器还剩下2G以上的内存。

[root@runsdata-bj-01 society-training-service-v1-0]# kcp |grep Pending
society-insurance-foundation-service-v2-0-7697b9bd5b-7btq6      0/1     Pending            0          60m
society-notice-service-v1-0-548b8d5946-c5gzm                    0/1     Pending            0          60m
society-online-business-service-v2-1-7f897f564-phqjs            0/1     Pending            0          60m
society-operation-gateway-7cf86b77bd-lmswm                      0/1     Pending            0          60m
society-operation-user-service-v1-1-755dcff964-dr9mj            0/1     Pending            0          60m
society-resident-service-v3-0-788446c49b-rzjsx                  0/1     Pending            0          60m
society-training-service-v1-0-774f8c5d98-tl7vq                  0/1     Pending            0          60m
society-user-service-v3-0-74865dd9d7-t9fwz                      0/1     Pending            0          60m
traefik-ingress-controller-8688cccf79-5gkjg                     0/1     Pending            0          60m
[root@runsdata-bj-01 society-training-service-v1-0]# kubectl top nodes
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
192.168.0.94   384m         9%     11482Mi         73%       
192.168.0.95   399m         9%     11833Mi         76%       
192.168.0.96   399m         9%     11023Mi         71%       
192.168.0.97   457m         11%    10782Mi         69%       
[root@runsdata-bj-01 society-training-service-v1-0]# kubectl get nodes
NAME           STATUS   ROLES    AGE   VERSION
192.168.0.94   Ready    <none>   8d    v1.16.2
192.168.0.95   Ready    <none>   8d    v1.16.2
192.168.0.96   Ready    <none>   8d    v1.16.2
192.168.0.97   Ready    <none>   8d    v1.16.2
[root@runsdata-bj-01 society-training-service-v1-0]#

这是所有4个节点的描述

[root@runsdata-bj-01 frontend]#kubectl describe node 192.168.0.94
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1930m (48%)   7600m (190%)
  memory             9846Mi (63%)  32901376Ki (207%)
  ephemeral-storage  0 (0%)        0 (0%)
Events:              <none>
[root@runsdata-bj-01 frontend]#kubectl describe node 192.168.0.95
    Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1670m (41%)   6600m (165%)
  memory             7196Mi (46%)  21380Mi (137%)
  ephemeral-storage  0 (0%)        0 (0%)
Events:              <none>
[root@runsdata-bj-01 frontend]# kubectl describe node 192.168.0.96
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                2610m (65%)   7 (175%)
  memory             9612Mi (61%)  19960Mi (128%)
  ephemeral-storage  0 (0%)        0 (0%)
Events:              <none>
[root@runsdata-bj-01 frontend]# kubectl describe node 192.168.0.97  
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests       Limits
  --------           --------       ------
  cpu                2250m (56%)    508200m (12705%)
  memory             10940Mi (70%)  28092672Ki (176%)
  ephemeral-storage  0 (0%)         0 (0%)
Events:              <none>

以及所有4个节点的内存:

[root@runsdata-bj-00 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        2.8G        6.7G        2.1M        5.7G         11G
Swap:            0B          0B          0B
[root@runsdata-bj-01 frontend]# free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        7.9G        3.7G        2.4M        3.6G        6.8G
Swap:            0B          0B          0B
[root@runsdata-bj-02 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        5.0G        2.9G        3.9M        7.4G        9.5G
Swap:            0B          0B          0B
[root@runsdata-bj-03 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        6.5G        2.2G        2.3M        6.6G        8.2G
Swap:            0B          0B          0B

这是kube-scheduler日志:

[root@runsdata-bj-01 log]# cat messages|tail -n 5000|grep kube-scheduler
Apr 17 14:31:24 runsdata-bj-01 kube-scheduler: E0417 14:31:24.404442   12740 factory.go:585] pod is already present in the activeQ
Apr 17 14:31:25 runsdata-bj-01 kube-scheduler: E0417 14:31:25.490310   12740 factory.go:585] pod is already present in the backoffQ
Apr 17 14:31:25 runsdata-bj-01 kube-scheduler: E0417 14:31:25.873292   12740 factory.go:585] pod is already present in the backoffQ
Apr 18 21:44:18 runsdata-bj-01 etcd: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-scheduler\" " with result "range_response_count:1 size:440" took too long (100.521269ms) to execute
Apr 18 21:59:40 runsdata-bj-01 kube-scheduler: E0418 21:59:40.050852   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:07 runsdata-bj-01 kube-scheduler: E0418 22:03:07.069465   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:07 runsdata-bj-01 kube-scheduler: E0418 22:03:07.950254   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:08 runsdata-bj-01 kube-scheduler: E0418 22:03:08.567290   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:09 runsdata-bj-01 kube-scheduler: E0418 22:03:09.152812   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:09 runsdata-bj-01 kube-scheduler: E0418 22:03:09.344902   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:04:32 runsdata-bj-01 kube-scheduler: E0418 22:04:32.969606   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:09:51 runsdata-bj-01 kube-scheduler: E0418 22:09:51.366877   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:32:16 runsdata-bj-01 kube-scheduler: E0418 22:32:16.430976   12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:32:16 runsdata-bj-01 kube-scheduler: E0418 22:32:16.441182   12740 factory.go:585] pod is already present in the activeQ

我搜索了google和stackoverflow，但找不到解决方案。
谁能帮我？

最佳答案

Kubernetes保留节点稳定性而不是资源供应，可用内存不是基于free -m命令计算的，如文档所述:

The value for memory.available is derived from the cgroupfs instead of tools like free -m. This is important because free -m does not work in a container, and if users use the node allocatable feature, out of resource decisions are made local to the end user Pod part of the cgroup hierarchy as well as the root node. This script reproduces the same set of steps that the kubelet performs to calculate memory.available. The kubelet excludes inactive_file (i.e. # of bytes of file-backed memory on inactive LRU list) from its calculation as it assumes that memory is reclaimable under pressure.

您可以使用上面提到的脚本来检查节点中的可用内存，如果没有可用资源，则需要添加新节点来增加群集大小。

此外，您可以检查文档页面以获取有关 resources limits的更多信息

关于memory - 我的机器有足够的内存，但是kubernetes无法调度pod并指示内存不足，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61266637/

文章推荐： java - 在java android中创建无限字符串

文章推荐： php - Facebook PHP SDK 错误 mb_substr

文章推荐： post - 使用从 SolrNet 到 Solr 的 POST 请求

SQL 查询导致我 sleep 不足
所以我正在为考试复习，并在 SQL 河(或荒地)中撞到了一块大石头我制作了以下表格并插入了以下数据: create table Permissions ( fileName VARCHAR(
JQueryUI 对话框 maxWidth 不足
我有一个使用 maxWidth 定义的 jqueryui 对话框。 $("#myDialog").dialog({ autoOpen: false, width: 'a
c - 如何使用平方根优化c中的循环(完美、丰富、不足)
注意:我遗漏了不相关的代码所以我目前正在研究 CCC 1996 P1，这个问题的全部目的是能够计算一个整数输入是完美数、不足数还是充数。我上面列出的代码可以工作，但是我认为它太慢了。该代码会迭代每个
r - R 中的关联规则 RAM 不足
已关闭。此问题需要 debugging details 。目前不接受答案。编辑问题以包含 desired behavior, a specific problem or error, and the
python - Redis 使用的 RAM 不足
我正在使用 Go 和 Redis 开发 API。问题是RAM使用不足，我找不到问题的根源。 TL;DR 版本有数百/数千个哈希对象。每个 1 KB 的对象(键+值)占用大约 0.5 MB 的 RAM
kubernetes - 由于 CPU 不足，Pod 处于挂起状态
在我的 GCE Kubernetes 集群上，我无法再创建 pod。 Warning FailedScheduling pod (www.caveconditions.com-f1be467e3
kubernetes - Amazon EKS Fargate中的 pod 不足
当我尝试在EKS Fargate群集上安装指标服务器时，它抛出错误: 0/4 nodes are available: 4 Insufficient pods. 按照以下说明从此处安装指标服务器:ht
ios - 为什么 iOS 终止后台应用程序而不是以不同方式处理 RAM 不足？
遍布this document Apple 提到 iOS 在某些情况下会终止应用程序，最常见的原因似乎是释放一些 RAM。这会导致未实现状态恢复的应用程序出现问题——用户正在处理和暂时离开的一些内容可
audio - Google Cloud Speech:配额组 token 不足
尝试处理一个10分钟的音频文件时出现以下错误。我刚刚开始使用Google Cloud产品，所以我是唯一访问此资源的人。我怎么可能超出配额？配额设置为其默认值，我认为我没有任何限制。还有其他原因吗？我
r - 对R中事物类型的全面考察； 'mode' 和 'class' 和 'typeof' 不足
R 语言让我感到困惑。实体有模式和类，但即使这样也不足以完全描述实体。这个answer说 In R every 'object' has a mode and a class. 所以我做了这些实验:
kubernetes - Openshift:没有与以下所有谓词匹配的可用节点::cpu 不足 (173)、MatchNodeSelector (5)
我在 west-1 有一个 Openshift v3 项目。在其中，我有一个运行良好的应用程序，但在 GitHub 提交代码中非常下游的内容后，该应用程序停止工作。问题在于制作 pod: No nod
kubernetes - Openshift:没有与以下所有谓词匹配的可用节点::cpu 不足 (173)、MatchNodeSelector (5)
我在 west-1 有一个 Openshift v3 项目。在其中，我有一个运行良好的应用程序，但在 GitHub 提交代码中非常下游的内容后，该应用程序停止工作。问题在于制作 pod: No nod
wolfram-mathematica - 我可以使用 Stackoverflow API 检查哪些 SO 回答者 sleep 不足？
在 how-do-i-access-the-stackoverflow-api-from-mathematica我概述了如何使用 SO API 让 Mathematica 制作一些有趣的顶级回答者声誉
node.js - 小型 Node.js 应用程序 Pod 的 GKE CPU 不足
所以在 GKE 上，我有一个 Node.js app，每个 pod 使用大约:CPU(cores): 5m, MEMORY: 100Mi 但是我只能为每个 Node 部署 1 个 pod。我使用的是
javascript - 消费者的服务 'AnalyticsDefaultGroup' 的配额 'USER-100s' 和限制 'analyticsreporting.googleapis.com' 的 token 不足
我正在使用 async.eachOfSeries 超过 300 个数组并请求一些 GA api，它工作正常但有时我会收到错误.. UnhandledPromiseRejectionWarning:错误
amazon-s3 - 0/3 个节点可用 : 1 node(s) had taints that the pod didn't tolerate, 2 cpu 不足。 MR3 hive
我正在尝试在 AWS ec2 上托管的 kubernetes 集群上使用 mr3 设置配置单元。当我运行命令 run-hive.sh 时，Hive 服务器启动，并且 master-DAg 被初始化，但
google-cloud-pubsub - 消费者 'administrator' 的服务 'CLIENT_PROJECT-100s' 的配额 'pubsub.googleapis.com' 和限制 'project_number:#' 的 token 不足
创建订阅时有时会出现以下错误: Insufficient tokens for quota 'administrator' and limit 'CLIENT_PROJECT-100s' of ser

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

memory - 我的机器有足够的内存，但是kubernetes无法调度pod并指示内存不足