gpt4 book ai didi

tensorflow - 如何在谷歌计算引擎上运行 tensorflow GPU 容器?

转载 作者:行者123 更新时间:2023-12-02 00:49:52 25 4
gpt4 key购买 nike

我正在尝试使用 GPU 加速器在谷歌计算引擎上运行 tensorflow 容器。

试过命令

gcloud compute instances create-with-container job-name \
--machine-type=n1-standard-4 \
--accelerator=type=nvidia-tesla-k80 \
--image-project=deeplearning-platform-release \
--image-family=common-container \
--container image gcr/io/my-container \
--container-arg="--container-arguments=xxxx"

但得到警告

WARNING: This container deployment mechanism requires a Container-Optimized OS image in order to work. Select an image from a cos-cloud project (cost-stable, cos-beta, cos-dev image families).



我还尝试了来自 cos-cloud 的系统镜像项目,它似乎没有 CUDA 驱动程序,因为 tensorflow 记录警告 cuInit failed .

想知道在具有 GPU 支持的谷歌计算引擎上运行 tensorflow 容器的正确方法是什么?

最佳答案

您可以 docker run您的容器在 startup-script 内的 deeplearningvm .


gcloud beta compute instances create deeplearningvm-$(date +"%Y%m%d-%H%M%S") \
--zone=us-central1-c \
--machine-type=n1-standard-8 \
--subnet=default \
--service-account=<your google service account> \
--scopes='https://www.googleapis.com/auth/cloud-platform' \
--accelerator=type=nvidia-tesla-k80,count=1 \
--image-project=deeplearning-platform-release \
--image-family=tf-latest-gpu \
--maintenance-policy=TERMINATE \
--metadata=install-nvidia-driver=True,startup-script='#!/bin/bash

# Check the driver until installed
while ! [[ -x "$(command -v nvidia-smi)" ]];
do
echo "sleep to check"
sleep 5s
done
echo "nvidia-smi is installed"

gcloud auth configure-docker
echo "Docker run with GPUs"
docker run --gpus all --log-driver=gcplogs --rm gcr.io/<your container>

echo "Kill VM $(hostname)"
gcloud compute instances delete $(hostname) --zone \
$(curl -H Metadata-Flavor:Google http://metadata.google.internal/computeMetadata/v1/instance/zone -s | cut -d/ -f4) -q

'


由于安装 nvidia 驱动程序需要几分钟,因此您必须等到安装后才能启动容器。 https://cloud.google.com/ai-platform/deep-learning-vm/docs/tensorflow_start_instance#creating_a_tensorflow_instance_from_the_command_line

Compute Engine loads the latest stable driver on the first boot and performs the necessary steps (including a final reboot to activate the driver). It may take up to 5 minutes before your VM is fully provisioned. In this time, you will be unable to SSH into your machine. When the installation is complete, to guarantee that the driver installation was successful, you can SSH in and run nvidia-smi.

关于tensorflow - 如何在谷歌计算引擎上运行 tensorflow GPU 容器?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58714973/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com