gpt4 book ai didi

Azure-ML 部署看不到 AzureML 环境(版本号错误)

转载 作者:行者123 更新时间:2023-12-05 06:12:57 24 4
gpt4 key购买 nike

我很好地遵循了文档的概述 here .

我按照以下方式设置了我的 Azure 机器学习环境:

from azureml.core import Workspace

# Connect to the workspace
ws = Workspace.from_config()

from azureml.core import Environment
from azureml.core import ContainerRegistry

myenv = Environment(name = "myenv")

myenv.inferencing_stack_version = "latest" # This will install the inference specific apt packages.

# Docker
myenv.docker.enabled = True
myenv.docker.base_image_registry.address = "myazureregistry.azurecr.io"
myenv.docker.base_image_registry.username = "myusername"
myenv.docker.base_image_registry.password = "mypassword"
myenv.docker.base_image = "4fb3..."
myenv.docker.arguments = None

# Environment variable (I need python to look at folders
myenv.environment_variables = {"PYTHONPATH":"/root"}

# python
myenv.python.user_managed_dependencies = True
myenv.python.interpreter_path = "/opt/miniconda/envs/myenv/bin/python"

from azureml.core.conda_dependencies import CondaDependencies
conda_dep = CondaDependencies()
conda_dep.add_pip_package("azureml-defaults")
myenv.python.conda_dependencies=conda_dep

myenv.register(workspace=ws) # works!

我有一个配置为推理的 Score.py 文件(与我遇到的问题无关)...

然后我设置推理配置

from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)

我设置了我的计算集群:

from azureml.core.compute import ComputeTarget, AksCompute
from azureml.exceptions import ComputeTargetException

# Choose a name for your cluster
aks_name = "theclustername"

# Check to see if the cluster already exists
try:
aks_target = ComputeTarget(workspace=ws, name=aks_name)
print('Found existing compute target')
except ComputeTargetException:
print('Creating a new compute target...')
prov_config = AksCompute.provisioning_configuration(vm_size="Standard_NC6_Promo")

aks_target = ComputeTarget.create(workspace=ws, name=aks_name, provisioning_configuration=prov_config)

aks_target.wait_for_completion(show_output=True)

from azureml.core.webservice import AksWebservice

# Example
gpu_aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,
num_replicas=3,
cpu_cores=4,
memory_gb=10)

一切都会成功;然后我尝试部署模型进行推理:

from azureml.core.model import Model

model = Model(ws, name="thenameofmymodel")

# Name of the web service that is deployed
aks_service_name = 'tryingtodeply'

# Deploy the model
aks_service = Model.deploy(ws,
aks_service_name,
models=[model],
inference_config=inference_config,
deployment_config=gpu_aks_config,
deployment_target=aks_target,
overwrite=True)

aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)

并且它失败说找不到环境。更具体地说,我的环境版本是版本 11,但它一直尝试查找版本号比当前环境高 1 的环境(即版本 12) :

FailedERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: 0f03a025-3407-4dc1-9922-a53cc27267d4
More information can be found here:
Error:
{
"code": "BadRequest",
"statusCode": 400,
"message": "The request is invalid",
"details": [
{
"code": "EnvironmentDetailsFetchFailedUserError",
"message": "Failed to fetch details for Environment with Name: myenv Version: 12."
}
]
}

我尝试手动编辑环境 JSON 以匹配 azureml 尝试获取的版本,但没有任何效果。谁能看出这段代码有什么问题吗?

更新

更改环境名称(例如 my_inference_env)并将其传递给 InferenceConfig 似乎是在正确的轨道上。但是,错误现在更改为以下内容

Running..........
Failed
ERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: f0dfc13b-6fb6-494b-91a7-de42b9384692
More information can be found here: https://some_long_http_address_that_leads_to_nothing
Error:
{
"code": "DeploymentFailed",
"statusCode": 404,
"message": "Deployment not found"
}

解决方案

下面关于 Azure ML 环境的使用,Anders 的回答确实正确。但是,我遇到的最后一个错误是因为我使用摘要值(a sha)而不是图像名称和标签(例如,imagename:tag)设置容器图像>)。请注意第一个 block 中的代码行:

myenv.docker.base_image = "4fb3..." 

我引用了摘要值,但它应该更改为

myenv.docker.base_image = "imagename:tag"

一旦我进行了更改,部署就成功了! :)

最佳答案

我花了一段时间才明白的一个概念是注册和使用 Azure ML 环境的分歧。如果您已经注册了您的环境 myenv,并且环境的任何详细信息均未更改,则无需使用 myenv.register() 重新注册它。您可以使用 Environment.get() 简单地获取已注册的环境。像这样:

myenv = Environment.get(ws, name='myenv', version=11)

我的建议是将您的环境命名为新名称:例如“model_scoring_env”。注册一次,然后将其传递给 InferenceConfig

关于Azure-ML 部署看不到 AzureML 环境(版本号错误),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63458904/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com