gpt4 book ai didi

bash - GitLab 作业作业成功但未完成(创建/删除 Azure AKS)

转载 作者:行者123 更新时间:2023-12-02 12:31:38 25 4
gpt4 key购买 nike

我正在使用运行程序动态创建 AKS 并删除以前的 AKS。

不幸的是,这些作业需要一段时间,而且我现在经常遇到作业突然停止的情况(在 az aks delete 或 az aks create 调用后 5 分钟以上的范围内)。

这种情况发生在 GitLab 中,经过多次重试后通常可以成功一次。

在一些谷歌搜索中发现,之前和之后的脚本可能会产生影响......但即使删除它们也没有区别。

是否有任何运行者规则或可能需要更改的特殊内容?当它因超时错误而停止时会更容易理解,但它会在作业成功时处理它,即使它甚至没有完成所有行的运行。以下是导致问题的分段:

create-kubernetes-az:
stage: create-kubernetes-az
image: microsoft/azure-cli:latest
# when: manual
script:
# REQUIRE CREATED SERVICE PRINCIPAL
- az login --service-principal -u ${AZ_PRINC_USER} -p ${AZ_PRINC_PASSWORD} --tenant ${AZ_PRINC_TENANT}
# Create Resource Group
- az group create --name ${AZ_RESOURCE_GROUP} --location ${AZ_RESOURCE_LOCATION}
# ERROR HAPPENS HERE # Delete Kubernetes Cluster // SOMETIMES STOPS AFTER THIS
- az aks delete --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_AKS_TEST_CLUSTER} --yes
#// OR HERE # Create Kubernetes Cluster // SOMETIMES STOPS AFTER THIS
- az aks create --name ${AZ_AKS_TEST_CLUSTER} --resource-group ${AZ_RESOURCE_GROUP} --node-count ${AZ_AKS_TEST_NODECOUNT} --service-principal ${AZ_PRINC_USER} --client-secret ${AZ_PRINC_PASSWORD} --generate-ssh-keys
# Get cubectl
- az aks install-cli
# Get Login Credentials
- az aks get-credentials --name ${AZ_AKS_TEST_CLUSTER} --resource-group ${AZ_RESOURCE_GROUP}
# Install Helm and Tiller on Azure Cloud Shell
- curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get > get_helm.sh
- chmod 700 get_helm.sh
- ./get_helm.sh
- helm init
- kubectl create serviceaccount --namespace kube-system tiller
- kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
- kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
# Create a namespace for your ingress resources
- kubectl create namespace ingress-basic
# Wait 1 minutes
- sleep 60
# Use Helm to deploy an NGINX ingress controller
- helm install stable/nginx-ingress --namespace ingress-basic --set controller.replicaCount=2 --set controller.nodeSelector."beta\.kubernetes\.io/os"=linux --set defaultBackend.nodeSelector."beta\.kubernetes\.io/os"=linux
# Test by get public IP
- kubectl get service
- kubectl get service -l app=nginx-ingress --namespace ingress-basic
#- while [ "$(kubectl get service -l app=nginx-ingress --namespace ingress-basic | grep pending)" == "pending" ]; do echo "Updating"; sleep 1 ; done && echo "Finished"
- while [ "$(kubectl get service -l app=nginx-ingress --namespace ingress-basic -o jsonpath='{.items[*].status.loadBalancer.ingress[*].ip}')" == "" ]; do echo "Updating"; sleep 10 ; done && echo "Finished"
# Add Ingress Ext IP / Alternative
- KUBip=$(kubectl get service -l app=nginx-ingress --namespace ingress-basic -o jsonpath='{.items[*].status.loadBalancer.ingress[*].ip}')
- echo $KUBip
# Add DNS Name - TODO - GITLAB ENV VARIABELEN KLAPPEN NICHT
- DNSNAME="bl-test"
# Get the resource-id of the public ip
- PUBLICIPID=$(az network public-ip list --query "[?ipAddress!=null]|[?contains(ipAddress, '$KUBip')].[id]" --output tsv)
- echo $PUBLICIPID
- az network public-ip update --ids $PUBLICIPID --dns-name $DNSNAME
#Install CertManager Console
# Install the CustomResourceDefinition resources separately
- kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.8/deploy/manifests/00-crds.yaml
# Create the namespace for cert-manager
- kubectl create namespace cert-manager
# Label the cert-manager namespace to disable resource validation
- kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
# Add the Jetstack Helm repository
- helm repo add jetstack https://charts.jetstack.io
# Update your local Helm chart repository cache
- helm repo update
# Install the cert-manager Helm chart
- helm install --name cert-manager --namespace cert-manager --version v0.8.0 jetstack/cert-manager
# Run Command issuer.yaml
- sed 's/_AZ_AKS_ISSUER_NAME_/'"${AZ_AKS_ISSUER_NAME}"'/g; s/_BL_DEV_E_MAIL_/'"${BL_DEV_E_MAIL}"'/g' infrastructure/kubernetes/cluster-issuer.yaml > cluster-issuer.yaml;
- kubectl apply -f cluster-issuer.yaml
# Run Command ingress.yaml
- sed 's/_BL_AZ_HOST_/'"beautylivery-test.${AZ_RESOURCE_LOCATION}.${AZ_AKS_HOST}"'/g; s/_AZ_AKS_ISSUER_NAME_/'"${AZ_AKS_ISSUER_NAME}"'/g' infrastructure/kubernetes/ingress.yaml > ingress.yaml;
- kubectl apply -f ingress.yaml

结果

Running with gitlab-runner 12.3.0 (a8a019e0)
on runner-gitlab-runner-676b494b6b-b5q6h gzi97H3Q
Using Kubernetes namespace: gitlab-managed-apps
Using Kubernetes executor with image microsoft/azure-cli:latest ...
Waiting for pod gitlab-managed-apps/runner-gzi97h3q-project-14628452-concurrent-0l8wsx to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-gzi97h3q-project-14628452-concurrent-0l8wsx to be running, status is Pending
Running on runner-gzi97h3q-project-14628452-concurrent-0l8wsx via runner-gitlab-runner-676b494b6b-b5q6h...
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/****/*******/.git/
Created fresh repository.
From https://gitlab.com/****/********
* [new branch] Setup-Kubernetes -> origin/Setup-Kubernetes
Checking out d2ca489b as Setup-Kubernetes...

Skipping Git submodules setup
$ function create_secret() { # collapsed multi-line command
$ echo "current time $(TZ=Europe/Berlin date +"%F %T")"
current time 2019-10-06 09:00:50
$ az login --service-principal -u ${AZ_PRINC_USER} -p ${AZ_PRINC_PASSWORD} --tenant ${AZ_PRINC_TENANT}
[
{
"cloudName": "AzureCloud",
"id": "******",
"isDefault": true,
"name": "Nutzungsbasierte Bezahlung",
"state": "Enabled",
"tenantId": "*******",
"user": {
"name": "http://*****",
"type": "servicePrincipal"
}
}
]
$ az group create --name ${AZ_RESOURCE_GROUP} --location ${AZ_RESOURCE_LOCATION}
{
"id": "/subscriptions/*********/resourceGroups/*****",
"location": "francecentral",
"managedBy": null,
"name": "******",
"properties": {
"provisioningState": "Succeeded"
},
"tags": null,
"type": "Microsoft.Resources/resourceGroups"
}
$ az aks delete --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_AKS_TEST_CLUSTER} --yes
Running after script...
$ echo "current time $(TZ=Europe/Berlin date +"%F %T")"
current time 2019-10-06 09:05:55
Job succeeded

有什么办法可以让运行完全吗?在最好的情况下会成功吗?

更新:这个想法是什么:我尝试自动化设置具有 SSL 和 DNS 管理的完整 kubernetes 集群的过程。一切都可以快速设置,并为 future 的不同用例和不同环境做好准备。我也想学习如何做得更好:)

新更新:

添加了解决方案

最佳答案

我添加了一个小解决方案,因为我期望它需要每隔一段时间执行一次......

到目前为止,az aks wait 命令似乎对我有用。并且上一个命令需要 --no-wait 才能继续。

# Delete Kubernetes Cluster 
- az aks delete --resource-group ${AZ_RESOURCE_GROUP} --name ${AZ_AKS_TEST_CLUSTER} --no-wait --yes
- az aks wait --deleted -g ${AZ_RESOURCE_GROUP} -n ${AZ_AKS_TEST_CLUSTER} --updated --interval 60 --timeout 1800
# Create Kubernetes Cluster
- az aks create --name ${AZ_AKS_TEST_CLUSTER} --resource-group ${AZ_RESOURCE_GROUP} --node-count ${AZ_AKS_TEST_NODECOUNT} --service-principal ${AZ_PRINC_USER} --client-secret ${AZ_PRINC_PASSWORD} --generate-ssh-keys --no-wait
- az aks wait --created -g ${AZ_RESOURCE_GROUP} -n ${AZ_AKS_TEST_CLUSTER} --updated --interval 60 --timeout 1800

关于bash - GitLab 作业作业成功但未完成(创建/删除 Azure AKS),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58255901/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com