gpt4 book ai didi

python - 如何将 AMLS 连接到 ADLS Gen 2?

转载 作者:行者123 更新时间:2023-12-04 11:47:14 25 4
gpt4 key购买 nike

我想在我的 Azure 机器学习工作区 ( azureml-core==1.12.0 ) 中注册来自 ADLS Gen2 的数据集。鉴于 Python SDK 中不需要服务主体信息 documentation.register_azure_data_lake_gen2() ,我成功地使用以下代码将 ADLS gen2 注册为数据存储:

from azureml.core import Datastore

adlsgen2_datastore_name = os.environ['adlsgen2_datastore_name']
account_name=os.environ['account_name'] # ADLS Gen2 account name
file_system=os.environ['filesystem']

adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(
workspace=ws,
datastore_name=adlsgen2_datastore_name,
account_name=account_name,
filesystem=file_system
)
但是,当我尝试注册数据集时,使用
from azureml.core import Dataset
adls_ds = Datastore.get(ws, datastore_name=adlsgen2_datastore_name)
data = Dataset.Tabular.from_delimited_files((adls_ds, 'folder/data.csv'))
我收到一个错误

Cannot load any data from the specified path. Make sure the path is accessible and contains data.ScriptExecutionException was caused by StreamAccessException.StreamAccessException was caused by AuthenticationException.'AdlsGen2-ReadHeaders' for '[REDACTED]' on storage failed with status code 'Forbidden' (This request is not authorized to perform this operation using this permission.), client request ID <CLIENT_REQUEST_ID>, request ID <REQUEST_ID>. Error message: [REDACTED]| session_id=<SESSION_ID>


我是否需要启用服务主体才能使其正常工作?使用 ML Studio UI,似乎甚至需要服务主体来注册数据存储。
我注意到的另一个问题是 AMLS 试图访问这里的数据集: https://adls_gen2_account_name.**dfs**.core.windows.net/container/folder/data.csv而 ADLS Gen2 中的实际 URI 是: https://adls_gen2_account_name.**blob**.core.windows.net/container/folder/data.csv

最佳答案

根据这个documentation ,您需要启用服务主体。
1.您需要注册您的应用程序并使用授予服务主体存储 Blob 数据读取器访问 .
enter image description here
2.试试这个代码:

adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(workspace=ws,
datastore_name=adlsgen2_datastore_name,
account_name=account_name,
filesystem=file_system,
tenant_id=tenant_id,
client_id=client_id,
client_secret=client_secret
)

adls_ds = Datastore.get(ws, datastore_name=adlsgen2_datastore_name)
dataset = Dataset.Tabular.from_delimited_files((adls_ds,'sample.csv'))
print(dataset.to_pandas_dataframe())
结果:
enter image description here

关于python - 如何将 AMLS 连接到 ADLS Gen 2?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63891547/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com