gpt4 book ai didi

Inconsistency between %run and dbutils.notebook.run when calling a Databricks notebook with parameters(使用参数调用Databricks笔记本时,%run和dbutils.note book.run之间的不一致)

转载 作者:bug小助手 更新时间:2023-10-25 19:59:20 27 4
gpt4 key购买 nike



I am encountering an error when attempting to use dbutils.notebook.run() that I don't encounter when using the %run command in what to my eyes is an equivalent fashion. I'm hoping I am just missing something, but I can't for the life of me see what it might be.

我在尝试使用dbutils.note book.run()时遇到了一个错误,而在我看来,在使用%run命令时没有遇到这个错误。我希望我只是错过了一些东西,但我无论如何也看不出它可能是什么。


I have a Databricks "utility" notebook (configure-storage) that configures a connection to an Azure Data Lake Storage gen2 (ADLS) account. It takes several parameters, some of which are Key Vault secret names that are used to retrieve the actual secret values for configuring the storage connection:

我有一个用于配置与Azure Data Lake Storage Gen2(ADLS)帐户的连接的Databricks“实用程序”笔记本(配置存储)。它采用几个参数,其中一些是用于检索用于配置存储连接的实际密码值的Key Vault密码名称:


# Notebook parameters
dbutils.widgets.text("storage_account","")
dbutils.widgets.text("tenant_id","")
dbutils.widgets.text("client_id","")
dbutils.widgets.text("client_secret","")

# Set storage account and get secrets from Key Vault
storage_account = dbutils.widgets.get("storage_account")
tenant_id = dbutils.secrets.get(scope="key-vault",key=dbutils.widgets.get("tenant_id"))
client_id = dbutils.secrets.get(scope="key-vault",key=dbutils.widgets.get("client_id"))
client_secret = dbutils.secrets.get(scope="key-vault",key=dbutils.widgets.get("client_secret"))

# Azure Data Lake Storage auth
spark.conf.set(f"fs.azure.account.auth.type.{storage_account}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account}.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account}.dfs.core.windows.net", f"{client_id}")
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account}.dfs.core.windows.net", client_secret)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account}.dfs.core.windows.net", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")

For illustration/troubleshooting, in the calling notebook I am just performing a simple read of a delta table:

为了说明/排除故障,在呼叫笔记本中,我只是执行了一个简单的增量表读取:


file_location = "abfss://<storage-container>@<storage-account>.dfs.core.windows.net/<path-to-delta-table>"
df = spark.read.format("delta").load(file_location)
display(df)

If in the calling notebook I use the %run command as follows, the above interaction with the ADLS account works just fine:

如果我在呼叫笔记本中使用%run命令,如下所示,上述与ADLS帐户的交互工作正常:


%run "../util/configure-storage" $storage_account="storage-account-name" $tenant_id="tenant-id-secret-name" $client_id="client-id-secret-name" $client_secret="client-secret-secret-name"

However, if I use dbutils.notebook.run() as follows...

但是,如果我使用dbutils.note book.run(),如下所示...


dbutils.notebook.run(
"../util/configure-storage", 60,
{"storage_account": "storage-account-name",
"tenant_id": "tenant-id-secret-name",
"client_id": "client-id-secret-name",
"client_secret": "client-secret-secret-name"})

...then the above interaction with the ADLS account results in the following error:

...则上述与ADLS帐户的交互会导致以下错误:


Py4JJavaError: An error occurred while calling o1442.load.
: Failure to initialize configuration for storage account <storage-account>.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:52)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:666)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:2055)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:267)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:225)
at com.databricks.common.filesystem.LokiFileSystem$.$anonfun$getLokiFS$1(LokiFileSystem.scala:63)
at com.databricks.common.filesystem.Cache.getOrCompute(Cache.scala:38)
at com.databricks.common.filesystem.LokiFileSystem$.getLokiFS(LokiFileSystem.scala:60)
at com.databricks.common.filesystem.LokiFileSystem.initialize(LokiFileSystem.scala:86)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at com.databricks.sql.transaction.tahoe.DeltaValidation$.validateDeltaRead(DeltaValidation.scala:102)
at org.apache.spark.sql.DataFrameReader.preprocessDeltaLoading(DataFrameReader.scala:280)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:329)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: Invalid configuration value detected for fs.azure.account.key
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.ConfigurationBasicValidator.validate(ConfigurationBasicValidator.java:49)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.Base64StringConfigurationBasicValidator.validate(Base64StringConfigurationBasicValidator.java:40)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.validateStorageAccountKey(SimpleKeyProvider.java:71)
at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:49)

I can certainly use %run, but I am really perplexed as to why the behavior is different when using dbutils.notebook.run() and would like to understand what I might be missing.

我当然可以使用%run,但我真的很困惑为什么在使用dbutils.note book.run()时行为会不同,我想知道我可能遗漏了什么。


更多回答
优秀答案推荐


The main difference between %run and dbutils.notebook.run is that the former is like #include in C/C++ - it includes all definitions from the referenced notebook into the current execution context so it's available for your caller notebook. While the latter is executing a given notebook as a separate job, and changes made there aren't propagated to the current execution context.

%run和dbutils.note book.run之间的主要区别是,前者类似于C/C++中的#Include-它将引用的笔记本中的所有定义包含到当前执行上下文中,因此它可用于调用者笔记本。而后者将给定的笔记本作为单独的作业执行,并且在那里所做的更改不会传播到当前执行上下文。


P.S. It's really described in the documentation.

P.S.它在文档中有详细描述。


更多回答

Thanks for this. I don’t know why this distinction wasn’t clearer to me when I read the documentation, but you’ve helped reiterate what I was missing.

谢谢你我不知道为什么当我阅读文档时,这种区别对我来说并不清楚,但是你帮助我重申了我所遗漏的。

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com