gpt4 book ai didi

python - 为什么 Databricks Python 无法从我的 Azure Datalake Storage Gen1 读取数据?

转载 作者:行者123 更新时间:2023-12-04 17:34:28 25 4
gpt4 key购买 nike

我正在尝试使用语法(受 documentation 启发)从 Databricks 笔记本读取来自 Azure Data Lake Storage Gen1 的文件 mydir/mycsv.csv

configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
"dfs.adls.oauth2.client.id": "123abc-1e42-31415-9265-12345678",
"dfs.adls.oauth2.credential": dbutils.secrets.get(scope = "adla", key = "adlamaywork"),
"dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/123456abc-2718-aaaa-9999-42424242abc/oauth2/token"}

dbutils.fs.mount(
source = "adl://myadls.azuredatalakestore.net/mydir",
mount_point = "/mnt/adls",
extra_configs = configs)

post_processed = spark.read.csv("/mnt/adls/mycsv.csv").collect()

post_processed.head(10).to_csv("/dbfs/processed.csv")

dbutils.fs.unmount("/mnt/adls")

我的客户端 123abc-1e42-31415-9265-12345678 可以访问 Data Lake Storage myadls 并且我已经创建了 secret

databricks secrets put --scope adla --key adlamaywork

当我在 Databricks notebook 中执行上面的 pyspark 代码时,在使用 spark.read.csv 访问 csv 文件时,我得到了

com.microsoft.azure.datalake.store.ADLException: Error getting info for file /mydir/mycsv.csv

当使用 dbfs ls dbfs:/mnt/adls 导航 dbfs 时,父挂载点似乎在那里,但我明白了

Error: b'{"error_code":"IO_ERROR","message":"Error fetching access token\nLast encountered exception thrown after 1 tries [HTTP0(null)]"}'

我做错了什么?

最佳答案

如果您不一定需要将目录挂载到 dbfs 中,您可以尝试直接从 adls 读取,如下所示:

spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.access.token.provider", "org.apache.hadoop.fs.adls.oauth2.ConfCredentialBasedAccessTokenProvider")
spark.conf.set("dfs.adls.oauth2.client.id", "123abc-1e42-31415-9265-12345678")
spark.conf.set("dfs.adls.oauth2.credential", dbutils.secrets.get(scope = "adla", key = "adlamaywork"))
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/123456abc-2718-aaaa-9999-42424242abc/oauth2/token")

csvFile = "adl://myadls.azuredatalakestore.net/mydir/mycsv.csv"

df = spark.read.format('csv').options(header='true', inferschema='true').load(csvFile)

关于python - 为什么 Databricks Python 无法从我的 Azure Datalake Storage Gen1 读取数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57203571/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com