gpt4 book ai didi

kubernetes - 在Kubernetes v1.13.1中调度GPU

转载 作者:行者123 更新时间:2023-12-02 12:18:17 25 4
gpt4 key购买 nike

我正在尝试在Kubernetes v1.13.1中计划GPU,我遵循了https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#deploying-nvidia-gpu-device-plugin中的指南

但是我运行时没有显示gpu资源kubectl get nodes -o yaml,根据this post,我检查了Nvidia gpu设备插件。

我跑:

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml

几次,结果是
Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml": daemonsets.extensions "nvidia-device-plugin-daemonset" already exists

看来我已经安装了NVIDIA设备插件?但是 kubectl get pods --all-namespaces的结果是
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
kube-system calico-node-qdhvd 2/2 Running 0 65m
kube-system coredns-78d4cf999f-fk4wl 1/1 Running 0 68m
kube-system coredns-78d4cf999f-zgfvl 1/1 Running 0 68m
kube-system etcd-liuqin01 1/1 Running 0 67m
kube-system kube-apiserver-liuqin01 1/1 Running 0 67m
kube-system kube-controller-manager-liuqin01 1/1 Running 0 67m
kube-system kube-proxy-l8p9p 1/1 Running 0 68m
kube-system kube-scheduler-liuqin01 1/1 Running 0 67m

当我运行 kubectl describe node时,gpu不在可分配资源中
Non-terminated Pods:         (9 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ----------- - ---------- --------------- ------------- ---
kube-system calico-node-qdhvd 250m (2%) 0 (0%) 0 (0%) 0 (0%) 18h
kube-system coredns-78d4cf999f-fk4wl 100m (0%) 0 (0%) 70Mi (0%) 170Mi (1%) 19h
kube-system coredns-78d4cf999f-zgfvl 100m (0%) 0 (0%) 70Mi (0%) 170Mi (1%) 19h
kube-system etcd-liuqin01 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-apiserver-liuqin01 250m (2%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-controller-manager-liuqin01 200m (1%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-proxy-l8p9p 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-scheduler-liuqin01 100m (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system nvidia-device-plugin-daemonset-p78wz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 26m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1 (8%) 0 (0%)
memory 140Mi (0%) 340Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)

最佳答案

正如lianyouCat在评论中提到的:

After installing nvidia-docker2, the default runtime of docker should be modified to nvidia docker as github.com/NVIDIA/k8s-device-plugin#preparing-your-gpu-nodes.

After modifying the /etc/docker/daemon.json, you need to restart docker so that the configuration works.

关于kubernetes - 在Kubernetes v1.13.1中调度GPU,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53894321/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com