gpt4 book ai didi

argo-workflows - Argo 示例工作流陷入挂起状态

转载 作者:行者123 更新时间:2023-12-04 10:01:35 29 4
gpt4 key购买 nike

我遵循 Argo 工作流程的 Getting Started文档。一切都很顺利,直到我运行第一个示例工作流程,如 4. Run Sample Workflows 中所述.工作流只是停留在挂起状态:

vagrant@master:~$ argo submit --watch https://raw.githubusercontent.com/argoproj/argo/master/examples/hello-world.yaml
Name: hello-world-z4lbs
Namespace: default
ServiceAccount: default
Status: Pending
Created: Thu May 14 12:36:45 +0000 (now)

vagrant@master:~$ argo list
NAME STATUS AGE DURATION PRIORITY
hello-world-z4lbs Pending 27m 0s 0

Here it was mentioned集合节点上的污点可能是问题所在,所以我取消了主节点的污点:
vagrant@master:~$ kubectl taint nodes --all node-role.kubernetes.io/master-
node/master untainted
taint "node-role.kubernetes.io/master" not found
taint "node-role.kubernetes.io/master" not found

然后我删除了待处理的工作流并重新提交,但它再次陷入待处理状态。

新提交的工作流的细节也卡住了:
vagrant@master:~$ kubectl describe workflow hello-world-8kvmb
Name: hello-world-8kvmb
Namespace: default
Labels: <none>
Annotations: <none>
API Version: argoproj.io/v1alpha1
Kind: Workflow
Metadata:
Creation Timestamp: 2020-05-14T13:57:44Z
Generate Name: hello-world-
Generation: 1
Managed Fields:
API Version: argoproj.io/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:generateName:
f:spec:
.:
f:arguments:
f:entrypoint:
f:templates:
f:status:
.:
f:finishedAt:
f:startedAt:
Manager: argo
Operation: Update
Time: 2020-05-14T13:57:44Z
Resource Version: 16780
Self Link: /apis/argoproj.io/v1alpha1/namespaces/default/workflows/hello-world-8kvmb
UID: aa82d005-b7ac-411f-9d0b-93f34876b673
Spec:
Arguments:
Entrypoint: whalesay
Templates:
Arguments:
Container:
Args:
hello world
Command:
cowsay
Image: docker/whalesay:latest
Name:
Resources:
Inputs:
Metadata:
Name: whalesay
Outputs:
Status:
Finished At: <nil>
Started At: <nil>
Events: <none>

在尝试获取工作流 Controller 日志时,我收到以下错误:
vagrant@master:~$ kubectl logs -n argo -l app=workflow-controller
Error from server (BadRequest): container "workflow-controller" in pod "workflow-controller-6c4787844c-lbksm" is waiting to start: ContainerCreating

相应工作流 Controller pod 的详细信息:
vagrant@master:~$ kubectl -n argo describe pods/workflow-controller-6c4787844c-lbksm
Name: workflow-controller-6c4787844c-lbksm
Namespace: argo
Priority: 0
Node: node-1/192.168.50.11
Start Time: Thu, 14 May 2020 12:08:29 +0000
Labels: app=workflow-controller
pod-template-hash=6c4787844c
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/workflow-controller-6c4787844c
Containers:
workflow-controller:
Container ID:
Image: argoproj/workflow-controller:v2.8.0
Image ID:
Port: <none>
Host Port: <none>
Command:
workflow-controller
Args:
--configmap
workflow-controller-configmap
--executor-image
argoproj/argoexec:v2.8.0
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from argo-token-pz4fd (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
argo-token-pz4fd:
Type: Secret (a volume populated by a Secret)
SecretName: argo-token-pz4fd
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 7m17s (x4739 over 112m) kubelet, node-1 Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 2m18s (x4950 over 112m) kubelet, node-1 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1bd1fd11dfe677c749b4a1260c29c2f8cff0d55de113d154a822e68b41f9438e" network for pod "workflow-controller-6c4787844c-lbksm": networkPlugin cni failed to set up pod "workflow-controller-6c4787844c-lbksm_argo" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/

我运行 Argo 2.8:
vagrant@master:~$ argo version
argo: v2.8.0
BuildDate: 2020-05-11T22:55:16Z
GitCommit: 8f696174746ed01b9bf1941ad03da62d312df641
GitTreeState: clean
GitTag: v2.8.0
GoVersion: go1.13.4
Compiler: gc
Platform: linux/amd64

我检查了集群状态,看起来没问题:
vagrant@master:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 95m v1.18.2
node-1 Ready <none> 92m v1.18.2
node-2 Ready <none> 92m v1.18.2

至于 K8s 集群安装,我使用 Vagrant 创建它,如 here 所述。 ,唯一的区别是:
  • libvirt 作为供应商
  • 较新版本的 Ubuntu:generic/ubuntu1804
  • Calico 较新版本:v3.14

  • 知道为什么工作流陷入挂起状态以及如何修复它吗?

    最佳答案

    工作流以 Pending 状态开始,然后由工作流 Controller pod(作为 Argo 的一部分安装在集群中)移动它们的步骤。
    工作流 Controller pod 卡在 ContainerCreating 中。 kubectl describe po {workflow-controller pod}显示与 Calico 相关的网络错误。
    正如评论中提到的,它看起来像是一个常见的 Calico 错误。一旦你清除了它,你的 hello-world 工作流应该会执行得很好。
    OP注意事项:进一步调试确认Calico问题(Calico节点未处于运行状态):

    vagrant@master:~$ kubectl get pods --all-namespaces
    NAMESPACE NAME READY STATUS RESTARTS AGE
    argo argo-server-84946785b-94bfs 0/1 ContainerCreating 0 3h59m
    argo workflow-controller-6c4787844c-lbksm 0/1 ContainerCreating 0 3h59m
    kube-system calico-kube-controllers-74d45555dd-zhkp6 0/1 CrashLoopBackOff 56 3h59m
    kube-system calico-node-2n9kt 0/1 CrashLoopBackOff 72 3h59m
    kube-system calico-node-b8sb8 0/1 Running 70 3h56m
    kube-system calico-node-pslzs 0/1 CrashLoopBackOff 67 3h56m
    kube-system coredns-66bff467f8-rmxsp 0/1 ContainerCreating 0 3h59m
    kube-system coredns-66bff467f8-z4lbq 0/1 ContainerCreating 0 3h59m
    kube-system etcd-master 1/1 Running 2 3h59m
    kube-system kube-apiserver-master 1/1 Running 2 3h59m
    kube-system kube-controller-manager-master 1/1 Running 2 3h59m
    kube-system kube-proxy-k59ks 1/1 Running 2 3h59m
    kube-system kube-proxy-mn96x 1/1 Running 1 3h56m
    kube-system kube-proxy-vxj8b 1/1 Running 1 3h56m
    kube-system kube-scheduler-master 1/1 Running 2 3h59m

    关于argo-workflows - Argo 示例工作流陷入挂起状态,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61799013/

    29 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com