gpt4 book ai didi

azure - AKS 中的节点池未缩减至 0 个节点?

转载 作者:行者123 更新时间:2023-12-03 06:15:18 28 4
gpt4 key购买 nike

我的 AKS 集群中有两个节点池;默认节点池和“应用程序”节点池。我将默认节点池用于 Airflow 等服务,并使用应用程序节点池来运行 ETL 作业。但是,应用程序节点池永远不会扩展到零,即使我在很多小时内没有安排任何 ETL 作业也是如此。

我不明白为什么。有没有人对问题的根本原因有任何建议?

集群使用 Terraform 进行解聚。自动缩放器配置如下:

auto_scaler_profile {
# (Optional) Maximum number of seconds the cluster autoscaler waits for pod termination when trying to scale down a node. Defaults to 600.
max_graceful_termination_sec = 180
# (Optional) How long after the scale up of AKS nodes the scale down evaluation resumes. Defaults to 10m.)
scale_down_delay_after_add = "3m"
# - (Optional) How long a node should be unneeded before it is eligible for scale down. Defaults to 10m.
scale_down_unneeded = "3m"
# (Optional) If true cluster autoscaler will never delete nodes with pods from kube-system (except for DaemonSet or mirror pods). Defaults to true.
skip_nodes_with_system_pods = false
}

应用程序节点池定义为:

resource "azurerm_kubernetes_cluster_node_pool" "main" {
name = "application"
kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
vm_size = "Standard_B4ms"
enable_auto_scaling = true
min_count = 0
max_count = 2
max_pods = 15

node_labels = {
"type" = "application"
}

}
<小时/>

以下是有关 AKS 群集的一些相关详细信息:

k top nodes

NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
aks-application-XXXXXXXX-vmss000000 55m 1% 1579Mi 12%
aks-default-XXXXXXXX-vmss000000 677m 17% 7783Mi 61%
az aks nodepool show \
--resource-group <my-rg> \
--cluster-name <my-cluster> \
--name application \
--query "{min: minCount, max: maxCount}"

{
"max": 2,
"min": 0
}
az aks show \
--resource-group <my-rg> \
--name <my-cluster> \
--query autoScalerProfile

{
"balanceSimilarNodeGroups": "false",
"expander": "random",
"maxEmptyBulkDelete": "10",
"maxGracefulTerminationSec": "180",
"maxNodeProvisionTime": "15m",
"maxTotalUnreadyPercentage": "45",
"newPodScaleUpDelay": "0s",
"okTotalUnreadyCount": "3",
"scaleDownDelayAfterAdd": "3m",
"scaleDownDelayAfterDelete": "10s",
"scaleDownDelayAfterFailure": "3m",
"scaleDownUnneededTime": "3m",
"scaleDownUnreadyTime": "20m",
"scaleDownUtilizationThreshold": "0.5",
"scanInterval": "10s",
"skipNodesWithLocalStorage": "true",
"skipNodesWithSystemPods": "false"
}
k get pods  --sort-by="{.spec.nodeName}" -A -o wide                                                                                                          
NAMESPACE NAME READY STATUS RESTARTS AGE NODE
kube-system azure-ip-masq-agent-XXXXX 1/1 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system metrics-server-XXXXXXXXXX-XXXXX 2/2 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system metrics-server-XXXXXXXXXX-XXXXX 2/2 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system kube-proxy-XXXXX 1/1 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system csi-blob-node-XXXXX 3/3 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system csi-azurefile-node-XXXXX 3/3 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system csi-azuredisk-node-XXXXX 3/3 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system cloud-node-manager-XXXXX 1/1 Running 0 3d17h aks-application-XXXXXXXX-vmss000000
kube-system cloud-node-manager-XXXXX 1/1 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
airflow-prod airflow-pgbouncer-XXXXXXXXXX-XXXXX 2/2 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
airflow-prod airflow-triggerer-XXXXXXXXX-XXXXX 1/1 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
airflow-prod airflow-webserver-XXXXXXXXX-XXXXX 1/1 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
airflow-prod airflow-scheduler-XXXXXXXXX-XXXXX 2/2 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
kube-system azure-ip-masq-agent-XXXXX 1/1 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
airflow-prod airflow-postgresql-0 1/1 Running 0 3d17h aks-default-XXXXXXXX-vmss000000
kube-system coredns-XXXXXXXXXX-XXXXX 1/1 Running 0 3d17h aks-default-XXXXXXXX-vmss000000
kube-system coredns-XXXXXXXXXX-XXXXX 1/1 Running 0 3d17h aks-default-XXXXXXXX-vmss000000
kube-system coredns-autoscaler-XXXXXXXXXX-XXXXX 1/1 Running 0 3d17h aks-default-XXXXXXXX-vmss000000
airflow-prod airflow-statsd-XXXXXXXX-XXXXX 1/1 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
kube-system csi-azuredisk-node-XXXXX 3/3 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
kube-system csi-azurefile-node-XXXXX 3/3 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
kube-system csi-blob-node-XXXXX 3/3 Running 0 3d21h aks-default-XXXXXXXX-vmss000000
kube-system konnectivity-agent-XXXXXXXXXX-XXXXX 1/1 Running 0 3d17h aks-default-XXXXXXXX-vmss000000
kube-system konnectivity-agent-XXXXXXXXXX-XXXXX 1/1 Running 0 3d17h aks-default-XXXXXXXX-vmss000000
kube-system kube-proxy-XXXXX 1/1 Running 0 3d21h aks-default-XXXXXXXX-vmss000000

enter image description here

最佳答案

某些系统 Pod 会阻止节点删除,因为系统 Pod 上没有节点关联。

您可以通过部署专用系统节点池来解决此问题。

关于azure - AKS 中的节点池未缩减至 0 个节点?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76312329/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com