gpt4 book ai didi

memory - 我的机器有足够的内存,但是kubernetes无法调度pod并指示内存不足

转载 作者:行者123 更新时间:2023-12-02 12:28:09 31 4
gpt4 key购买 nike

我有一个1.16.2版本的kubernetes集群。当我在副本为1的群集中部署所有服务时,它可以正常工作。然后,我将所有服务的副本都缩放到2并 checkout 。发现某些服务运行正常,但某些状态处于挂起状态。
当我kubectl描述一个Pending Pane 时,我收到如下消息

[root@runsdata-bj-01 society-training-service-v1-0]# kcd society-resident-service-v3-0-788446c49b-rzjsx
Name: society-resident-service-v3-0-788446c49b-rzjsx
Namespace: runsdata
Priority: 0
Node: <none>
Labels: app=society-resident-service-v3-0
pod-template-hash=788446c49b
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/society-resident-service-v3-0-788446c49b
Containers:
society-resident-service-v3-0:
Image: docker.ssiid.com/society-resident-service:3.0.33
Port: 8231/TCP
Host Port: 0/TCP
Limits:
cpu: 1
memory: 4Gi
Requests:
cpu: 200m
memory: 2Gi
Liveness: http-get http://:8231/actuator/health delay=600s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:8231/actuator/health delay=30s timeout=5s period=10s #success=1 #failure=3
Environment:
spring_profiles_active: production
TZ: Asia/Hong_Kong
JAVA_OPTS: -Djgroups.use.jdk_logger=true -Xmx4000M -Xms4000M -Xmn600M -XX:PermSize=500M -XX:MaxPermSize=500M -Xss384K -XX:+DisableExplicitGC -XX:SurvivorRatio=1 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSClassUnloadingEnabled -XX:LargePageSizeInBytes=128M -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+PrintClassHistogram -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -Xloggc:log/gc.log
Mounts:
/data/storage from nfs-data-storage (rw)
/opt/security from security (rw)
/var/log/runsdata from log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from application-token-vgcvb (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
log:
Type: HostPath (bare host directory volume)
Path: /log/runsdata
HostPathType:
security:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-security-claim
ReadOnly: false
nfs-data-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-storage-claim
ReadOnly: false
application-token-vgcvb:
Type: Secret (a volume populated by a Secret)
SecretName: application-token-vgcvb
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler 0/4 nodes are available: 4 Insufficient memory.

从下面可以看到我的机器还剩下2G以上的内存。
[root@runsdata-bj-01 society-training-service-v1-0]# kcp |grep Pending
society-insurance-foundation-service-v2-0-7697b9bd5b-7btq6 0/1 Pending 0 60m
society-notice-service-v1-0-548b8d5946-c5gzm 0/1 Pending 0 60m
society-online-business-service-v2-1-7f897f564-phqjs 0/1 Pending 0 60m
society-operation-gateway-7cf86b77bd-lmswm 0/1 Pending 0 60m
society-operation-user-service-v1-1-755dcff964-dr9mj 0/1 Pending 0 60m
society-resident-service-v3-0-788446c49b-rzjsx 0/1 Pending 0 60m
society-training-service-v1-0-774f8c5d98-tl7vq 0/1 Pending 0 60m
society-user-service-v3-0-74865dd9d7-t9fwz 0/1 Pending 0 60m
traefik-ingress-controller-8688cccf79-5gkjg 0/1 Pending 0 60m
[root@runsdata-bj-01 society-training-service-v1-0]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
192.168.0.94 384m 9% 11482Mi 73%
192.168.0.95 399m 9% 11833Mi 76%
192.168.0.96 399m 9% 11023Mi 71%
192.168.0.97 457m 11% 10782Mi 69%
[root@runsdata-bj-01 society-training-service-v1-0]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
192.168.0.94 Ready <none> 8d v1.16.2
192.168.0.95 Ready <none> 8d v1.16.2
192.168.0.96 Ready <none> 8d v1.16.2
192.168.0.97 Ready <none> 8d v1.16.2
[root@runsdata-bj-01 society-training-service-v1-0]#

这是所有4个节点的描述
[root@runsdata-bj-01 frontend]#kubectl describe node 192.168.0.94
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1930m (48%) 7600m (190%)
memory 9846Mi (63%) 32901376Ki (207%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
[root@runsdata-bj-01 frontend]#kubectl describe node 192.168.0.95
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1670m (41%) 6600m (165%)
memory 7196Mi (46%) 21380Mi (137%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
[root@runsdata-bj-01 frontend]# kubectl describe node 192.168.0.96
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2610m (65%) 7 (175%)
memory 9612Mi (61%) 19960Mi (128%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
[root@runsdata-bj-01 frontend]# kubectl describe node 192.168.0.97
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2250m (56%) 508200m (12705%)
memory 10940Mi (70%) 28092672Ki (176%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>

以及所有4个节点的内存:
[root@runsdata-bj-00 ~]# free -h
total used free shared buff/cache available
Mem: 15G 2.8G 6.7G 2.1M 5.7G 11G
Swap: 0B 0B 0B
[root@runsdata-bj-01 frontend]# free -h
total used free shared buff/cache available
Mem: 15G 7.9G 3.7G 2.4M 3.6G 6.8G
Swap: 0B 0B 0B
[root@runsdata-bj-02 ~]# free -h
total used free shared buff/cache available
Mem: 15G 5.0G 2.9G 3.9M 7.4G 9.5G
Swap: 0B 0B 0B
[root@runsdata-bj-03 ~]# free -h
total used free shared buff/cache available
Mem: 15G 6.5G 2.2G 2.3M 6.6G 8.2G
Swap: 0B 0B 0B

这是kube-scheduler日志:
[root@runsdata-bj-01 log]# cat messages|tail -n 5000|grep kube-scheduler
Apr 17 14:31:24 runsdata-bj-01 kube-scheduler: E0417 14:31:24.404442 12740 factory.go:585] pod is already present in the activeQ
Apr 17 14:31:25 runsdata-bj-01 kube-scheduler: E0417 14:31:25.490310 12740 factory.go:585] pod is already present in the backoffQ
Apr 17 14:31:25 runsdata-bj-01 kube-scheduler: E0417 14:31:25.873292 12740 factory.go:585] pod is already present in the backoffQ
Apr 18 21:44:18 runsdata-bj-01 etcd: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-scheduler\" " with result "range_response_count:1 size:440" took too long (100.521269ms) to execute
Apr 18 21:59:40 runsdata-bj-01 kube-scheduler: E0418 21:59:40.050852 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:07 runsdata-bj-01 kube-scheduler: E0418 22:03:07.069465 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:07 runsdata-bj-01 kube-scheduler: E0418 22:03:07.950254 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:08 runsdata-bj-01 kube-scheduler: E0418 22:03:08.567290 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:09 runsdata-bj-01 kube-scheduler: E0418 22:03:09.152812 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:09 runsdata-bj-01 kube-scheduler: E0418 22:03:09.344902 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:04:32 runsdata-bj-01 kube-scheduler: E0418 22:04:32.969606 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:09:51 runsdata-bj-01 kube-scheduler: E0418 22:09:51.366877 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:32:16 runsdata-bj-01 kube-scheduler: E0418 22:32:16.430976 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:32:16 runsdata-bj-01 kube-scheduler: E0418 22:32:16.441182 12740 factory.go:585] pod is already present in the activeQ

我搜索了google和stackoverflow,但找不到解决方案。
谁能帮我 ?

最佳答案

Kubernetes保留节点稳定性而不是资源供应,可用内存不是基于free -m命令计算的,如文档所述:

The value for memory.available is derived from the cgroupfs instead of tools like free -m. This is important because free -m does not work in a container, and if users use the node allocatable feature, out of resource decisions are made local to the end user Pod part of the cgroup hierarchy as well as the root node. This script reproduces the same set of steps that the kubelet performs to calculate memory.available. The kubelet excludes inactive_file (i.e. # of bytes of file-backed memory on inactive LRU list) from its calculation as it assumes that memory is reclaimable under pressure.



您可以使用上面提到的脚本来检查节点中的可用内存,如果没有可用资源,则需要添加新节点来增加群集大小。

此外,您可以检查文档页面以获取有关 resources limits的更多信息

关于memory - 我的机器有足够的内存,但是kubernetes无法调度pod并指示内存不足,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61266637/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com