gpt4 book ai didi

kubernetes - 如何在限制GCP成本的同时扩展kubernetes集群

转载 作者:行者123 更新时间:2023-12-02 11:36:29 24 4
gpt4 key购买 nike

我们在Google Cloud Platform上建立了GKE集群。

我们的 Activity 需要“爆发”计算能力。

想象一下,我们通常通常平均每小时进行100次计算,然后突然我们需要能够在不到两分钟的时间内处理100000次。但是,在大多数情况下,一切都接近于闲置。

我们不想99%的时间为空闲服务器付费,而是要根据实际使用来扩展集群(不需要数据持久性,以后可以删除服务器)。我查看了kubernetes上有关自动缩放的文档,以使用HPA添加更多的pod并使用cluster autoscaler添加更多的节点

但是,这些解决方案似乎都无法降低我们的成本或提高性能,因为它们似乎并没有超出GCP计划:

假设我们有一个带有8个CPU的google plan。我的理解是,如果我们使用集群自动缩放器添加更多节点,我们将不再使用例如2个节点,每个节点使用4个CPU,我们将有4个节点,每个节点使用2个CPU。但是总可用计算能力仍为8 CPU。
HPA具有更多 pod 而不是更多节点的理由相同。

如果我们有8个CPU的付款计划,但只使用其中4个,我的理解是我们仍然要为8个计费,因此缩减规模并不是真正有用的。

我们想要的是自动缩放以临时更改我们的付款计划(想象从n1-standard-8到n1-standard-16)并获得实际的新计算能力。

我不敢相信我们是唯一拥有这种用例的人,但是我在任何地方都找不到关于此用例的任何文档!我误会了吗?

最佳答案

TL; DR:

  • 创建一个小型持久节点池
  • 创建一个功能强大的节点池,该节点池在不使用时可以缩放为零(并停止计费)。
  • 使用的工具:
  • GKE的Cluster AutoscalingNode selectorAnti-affinity rulesTaints and tolerations


  • GKE定价:
  • 来自GKE Pricing:

    Starting June 6, 2020, GKE will charge a cluster management fee of $0.10 per cluster per hour. The following conditions apply to the cluster management fee:

    • One zonal cluster per billing account is free.
    • The fee is flat, irrespective of cluster size and topology.
    • Billing is computed on a per-second basis for each cluster. The total amount is rounded to the nearest cent, at the end of each month.
  • 来自Pricing for Worker Nodes:

    GKE uses Compute Engine instances for worker nodes in the cluster. You are billed for each of those instances according to Compute Engine's pricing, until the nodes are deleted. Compute Engine resources are billed on a per-second basis with a one-minute minimum usage cost.

  • 输入,Cluster Autoscaler:

    automatically resize your GKE cluster’s node pools based on the demands of your workloads. When demand is high, cluster autoscaler adds nodes to the node pool. When demand is low, cluster autoscaler scales back down to a minimum size that you designate. This can increase the availability of your workloads when you need it, while controlling costs.



  • Cluster Autoscaler无法将整个集群缩放到零,集群中至少必须有一个节点可用以运行系统Pod。
  • 既然您已经有持久的工作量,那么这将不是问题,我们将要做的是创建一个新的node pool:

    A node pool is a group of nodes within a cluster that all have the same configuration. Every cluster has at least one default node pool, but you can add other node pools as needed.

  • 在此示例中,我将创建两个节点池:
  • 一种默认节点池,其固定大小为一个节点,实例大小较小(模拟您已经拥有的集群)。
  • 第二个节点池,具有更多的计算能力来运行作业(我将其称为电源池)。
  • 选择具有运行AI Jobs所需的功能的机器类型,在本例中,我将创建一个n1-standard-8
  • 此功率池将设置为自动缩放,以允许最多4个节点,最少0个节点。
  • 如果您想添加GPU,则可以检查以下功能:Guide Scale to almost zero + GPUs

  • 污染与容忍:
  • 只有与AI工作负载相关的作业将在电源池中运行,因为该作业在作业窗格中使用node selector来确保它们在电源池节点中运行。
  • 设置anti-affinity规则,以确保不能在同一节点上安排两个训练 pods (优化性价比),这取决于您的工作量。
  • 在电源池中添加taint,以避免在自动伸缩池上安排其他工作负载(和系统资源)。
  • 将公差添加到AI作业中,以使其在那些节点上运行。


  • 复制:
  • 使用持久性默认池创建集群:
  • PROJECT_ID="YOUR_PROJECT_ID"  
    GCP_ZONE="CLUSTER_ZONE"
    GKE_CLUSTER_NAME="CLUSTER_NAME"
    AUTOSCALE_POOL="power-pool"

    gcloud container clusters create ${GKE_CLUSTER_NAME} \
    --machine-type="n1-standard-1" \
    --num-nodes=1 \
    --zone=${GCP_ZONE} \
    --project=${PROJECT_ID}
  • 创建自动缩放池:
  • gcloud container node-pools create ${GKE_BURST_POOL} \
    --cluster=${GKE_CLUSTER_NAME} \
    --machine-type=n1-standard-8 \
    --node-labels=load=on-demand \
    --node-taints=reserved-pool=true:NoSchedule \
    --enable-autoscaling \
    --min-nodes=0 \
    --max-nodes=4 \
    --zone=${GCP_ZONE} \
    --project=${PROJECT_ID}
  • 有关参数的说明:
  • --node-labels=load=on-demand:在动力池中的节点上添加标签,以允许使用node selector在我们的AI作业中选择它们。
  • --node-taints=reserved-pool=true:NoSchedule:在节点上添加taint,以防止在此节点池中意外调度任何其他工作负载。
  • 在这里,您可以看到我们创建的两个池,具有1个节点的静态池和具有0-4个节点的可自动扩展的池。

  • enter image description here

    由于没有可伸缩的节点池上正在运行的工作负载,因此它显示了0个节点正在运行(并且没有费用,而没有正在执行的节点)。
  • 现在,我们将创建一个作业,该作业创建4个并行的Pod,运行5分钟。
  • 该作业将具有以下参数以区别于普通 pods :
  • parallelism: 4:使用所有4个节点来增强性能
  • nodeSelector.load: on-demand:分配给带有该标签的节点。
  • podAntiAffinity:声明我们不希望两个带有相同标签app: greedy-job的容器在同一节点上运行(可选)。
  • tolerations:使容差与我们附加到节点的异味相匹配,因此允许在这些节点中调度这些Pod。
  • apiVersion: batch/v1  
    kind: Job
    metadata:
    name: greedy-job
    spec:
    parallelism: 4
    template:
    metadata:
    name: greedy-job
    labels:
    app: greedy-app
    spec:
    containers:
    - name: busybox
    image: busybox
    args:
    - sleep
    - "300"
    nodeSelector:
    load: on-demand
    affinity:
    podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
    matchExpressions:
    - key: app
    operator: In
    values:
    - greedy-app
    topologyKey: "kubernetes.io/hostname"
    tolerations:
    - key: reserved-pool
    operator: Equal
    value: "true"
    effect: NoSchedule
    restartPolicy: OnFailure
  • 现在我们的集群处于待机状态,我们将使用刚创建的作业yaml(我将其称为greedyjob.yaml)。该作业将运行四个进程,这些进程将并行运行,并在大约5分钟后完成。
  • $ kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 42m v1.14.10-gke.27

    $ kubectl get pods
    No resources found in default namespace.

    $ kubectl apply -f greedyjob.yaml
    job.batch/greedy-job created

    $ kubectl get pods
    NAME READY STATUS RESTARTS AGE
    greedy-job-2xbvx 0/1 Pending 0 11s
    greedy-job-72j8r 0/1 Pending 0 11s
    greedy-job-9dfdt 0/1 Pending 0 11s
    greedy-job-wqct9 0/1 Pending 0 11s
  • 我们的工作已经应用,但是正在等待中,让我们看看这些 pods 中正在发生什么:
  • $ kubectl describe pod greedy-job-2xbvx
    ...
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Warning FailedScheduling 28s (x2 over 28s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match node selector.
    Normal TriggeredScaleUp 23s cluster-autoscaler pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/owilliam/zones/us-central1-b/instanceGroups/gke-autoscale-to-zero-clus-power-pool-564148fd-grp 0->1 (max: 4)}]
  • 由于我们定义的规则,无法将Pod调度在当前节点上,这会触发Power-pool上的Scale Up例程。这是一个非常动态的过程,第90个节点在90秒后启动并运行:
  • $ kubectl get pods
    NAME READY STATUS RESTARTS AGE
    greedy-job-2xbvx 0/1 Pending 0 93s
    greedy-job-72j8r 0/1 ContainerCreating 0 93s
    greedy-job-9dfdt 0/1 Pending 0 93s
    greedy-job-wqct9 0/1 Pending 0 93s

    $ kubectl nodes
    NAME STATUS ROLES AGE VERSION
    gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 44m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 11s v1.14.10-gke.27
  • 由于我们设置了容器反亲和性规则,因此无法在已启动的节点上调度第二个容器,并触发下一个扩展,请查看第二个容器上的事件:
  • $ k describe pod greedy-job-2xbvx
    ...
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Normal TriggeredScaleUp 2m45s cluster-autoscaler pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/owilliam/zones/us-central1-b/instanceGroups/gke-autoscale-to-zero-clus-power-pool-564148fd-grp 0->1 (max: 4)}]
    Warning FailedScheduling 93s (x3 over 2m50s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match node selector.
    Warning FailedScheduling 79s (x3 over 83s) default-scheduler 0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) had taints that the pod didn't tolerate.
    Normal TriggeredScaleUp 62s cluster-autoscaler pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/owilliam/zones/us-central1-b/instanceGroups/gke-autoscale-to-zero-clus-power-pool-564148fd-grp 1->2 (max: 4)}]
    Warning FailedScheduling 3s (x3 over 68s) default-scheduler 0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules.
  • 重复相同的过程,直到满足所有要求:
  • $ kubectl get pods
    NAME READY STATUS RESTARTS AGE
    greedy-job-2xbvx 0/1 Pending 0 3m39s
    greedy-job-72j8r 1/1 Running 0 3m39s
    greedy-job-9dfdt 0/1 Pending 0 3m39s
    greedy-job-wqct9 1/1 Running 0 3m39s

    $ kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 46m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 2m16s v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-sf6q Ready <none> 28s v1.14.10-gke.27

    $ kubectl get pods
    NAME READY STATUS RESTARTS AGE
    greedy-job-2xbvx 0/1 Pending 0 5m19s
    greedy-job-72j8r 1/1 Running 0 5m19s
    greedy-job-9dfdt 1/1 Running 0 5m19s
    greedy-job-wqct9 1/1 Running 0 5m19s

    $ kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 48m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-39m2 Ready <none> 63s v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 4m8s v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-sf6q Ready <none> 2m20s v1.14.10-gke.27

    $ kubectl get pods
    NAME READY STATUS RESTARTS AGE
    greedy-job-2xbvx 1/1 Running 0 6m12s
    greedy-job-72j8r 1/1 Running 0 6m12s
    greedy-job-9dfdt 1/1 Running 0 6m12s
    greedy-job-wqct9 1/1 Running 0 6m12s

    $ kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 48m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-39m2 Ready <none> 113s v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-ggxv Ready <none> 26s v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 4m58s v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-sf6q Ready <none> 3m10s v1.14.10-gke.27

    enter image description here
    enter image description here
    在这里我们可以看到所有节点现在都已启动并正在运行(因此,按秒计费)
  • 现在所有作业都在运行,几分钟后,作业将完成其任务:
  • $ kubectl get pods
    NAME READY STATUS RESTARTS AGE
    greedy-job-2xbvx 1/1 Running 0 7m22s
    greedy-job-72j8r 0/1 Completed 0 7m22s
    greedy-job-9dfdt 1/1 Running 0 7m22s
    greedy-job-wqct9 1/1 Running 0 7m22s

    $ kubectl get pods
    NAME READY STATUS RESTARTS AGE
    greedy-job-2xbvx 0/1 Completed 0 11m
    greedy-job-72j8r 0/1 Completed 0 11m
    greedy-job-9dfdt 0/1 Completed 0 11m
    greedy-job-wqct9 0/1 Completed 0 11m
  • 任务完成后,自动缩放器将开始缩小集群的大小。
  • 您可以在此处了解有关此过程规则的更多信息:GKE Cluster AutoScaler
  • $ while true; do kubectl get nodes ; sleep 60; done
    NAME STATUS ROLES AGE VERSION
    gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 54m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-39m2 Ready <none> 7m26s v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-ggxv Ready <none> 5m59s v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 10m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-sf6q Ready <none> 8m43s v1.14.10-gke.27

    NAME STATUS ROLES AGE VERSION
    gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 62m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-39m2 Ready <none> 15m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-ggxv Ready <none> 14m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 18m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-sf6q NotReady <none> 16m v1.14.10-gke.27
  • 一旦满足条件,自动缩放器会将节点标记为NotReady并开始删除它们:
  • NAME                                                  STATUS     ROLES    AGE   VERSION
    gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 64m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-39m2 NotReady <none> 17m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-ggxv NotReady <none> 16m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 20m v1.14.10-gke.27

    NAME STATUS ROLES AGE VERSION
    gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 65m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-39m2 NotReady <none> 18m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-ggxv NotReady <none> 17m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw NotReady <none> 21m v1.14.10-gke.27

    NAME STATUS ROLES AGE VERSION
    gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 66m v1.14.10-gke.27
    gke-autoscale-to-zero-clus-power-pool-564148fd-ggxv NotReady <none> 18m v1.14.10-gke.27

    NAME STATUS ROLES AGE VERSION
    gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 67m v1.14.10-gke.27

  • 这是确认已从GKE和VM中删除节点的确认(请记住,每个节点都是记为Compute Engine的虚拟机):

  • Compute Engine :(请注意 gke-cluster-1-default-pool来自另一个集群,我将其添加到了屏幕截图中,向您展示了除了默认的持久节点之外,集群 gke-autoscale-to-zero中没有其他节点。)
    enter image description here

    GKE:
    enter image description here

    最终想法:

    When scaling down, cluster autoscaler respects scheduling and eviction rules set on Pods. These restrictions can prevent a node from being deleted by the autoscaler. A node's deletion could be prevented if it contains a Pod with any of these conditions: An application's PodDisruptionBudget can also prevent autoscaling; if deleting nodes would cause the budget to be exceeded, the cluster does not scale down.



    您可以注意到,该过程确实非常快,在我们的示例中,升级一个节点花了大约90秒,完成了一个备用节点的缩减花了5分钟,从而极大地改善了计费。
  • Preemptible VMs可以进一步减少帐单,但是您必须考虑正在运行的工作负载类型:

  • Preemptible VMs are Compute Engine VM instances that last a maximum of 24 hours and provide no availability guarantees. Preemptible VMs are priced lower than standard Compute Engine VMs and offer the same machine types and options.



    我知道您仍在考虑为您的应用程序选择最佳架构。

    使用 APP EngineIA Platform也是最佳解决方案,但是由于您当前正在GKE上运行工作负载,因此我想按要求向您展示一个示例。

    如果您还有其他问题,请在评论中告诉我。

    关于kubernetes - 如何在限制GCP成本的同时扩展kubernetes集群,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61626566/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com