Inconsistency between %run and dbutils.notebook.run when calling a Databricks notebook with parameters(使用参数调用Databricks笔记本时，%run和dbutils.note book.run之间的不一致)-6ren

Inconsistency between %run and dbutils.notebook.run when calling a Databricks notebook with parameters(使用参数调用Databricks笔记本时，%run和dbutils.note book.run之间的不一致)

转载作者：bug小助手更新时间：2023-10-25 19:59:20

I am encountering an error when attempting to use dbutils.notebook.run() that I don't encounter when using the %run command in what to my eyes is an equivalent fashion. I'm hoping I am just missing something, but I can't for the life of me see what it might be.

我在尝试使用dbutils.note book.run()时遇到了一个错误，而在我看来，在使用%run命令时没有遇到这个错误。我希望我只是错过了一些东西，但我无论如何也看不出它可能是什么。

I have a Databricks "utility" notebook (configure-storage) that configures a connection to an Azure Data Lake Storage gen2 (ADLS) account. It takes several parameters, some of which are Key Vault secret names that are used to retrieve the actual secret values for configuring the storage connection:

我有一个用于配置与Azure Data Lake Storage Gen2(ADLS)帐户的连接的Databricks“实用程序”笔记本(配置存储)。它采用几个参数，其中一些是用于检索用于配置存储连接的实际密码值的Key Vault密码名称：

# Notebook parameters
dbutils.widgets.text("storage_account","")
dbutils.widgets.text("tenant_id","")
dbutils.widgets.text("client_id","")
dbutils.widgets.text("client_secret","")

# Set storage account and get secrets from Key Vault
storage_account = dbutils.widgets.get("storage_account")
tenant_id = dbutils.secrets.get(scope="key-vault",key=dbutils.widgets.get("tenant_id"))
client_id = dbutils.secrets.get(scope="key-vault",key=dbutils.widgets.get("client_id"))
client_secret = dbutils.secrets.get(scope="key-vault",key=dbutils.widgets.get("client_secret"))

# Azure Data Lake Storage auth
spark.conf.set(f"fs.azure.account.auth.type.{storage_account}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account}.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account}.dfs.core.windows.net", f"{client_id}")
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account}.dfs.core.windows.net", client_secret)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account}.dfs.core.windows.net", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")

For illustration/troubleshooting, in the calling notebook I am just performing a simple read of a delta table:

为了说明/排除故障，在呼叫笔记本中，我只是执行了一个简单的增量表读取：

file_location = "abfss://<storage-container>@<storage-account>.dfs.core.windows.net/<path-to-delta-table>"
df = spark.read.format("delta").load(file_location)
display(df)

If in the calling notebook I use the %run command as follows, the above interaction with the ADLS account works just fine:

如果我在呼叫笔记本中使用%run命令，如下所示，上述与ADLS帐户的交互工作正常：

%run "../util/configure-storage" $storage_account="storage-account-name" $tenant_id="tenant-id-secret-name" $client_id="client-id-secret-name" $client_secret="client-secret-secret-name"

However, if I use dbutils.notebook.run() as follows...

但是，如果我使用dbutils.note book.run()，如下所示...

dbutils.notebook.run(
    "../util/configure-storage", 60,
    {"storage_account": "storage-account-name",
     "tenant_id": "tenant-id-secret-name",
     "client_id": "client-id-secret-name",
     "client_secret": "client-secret-secret-name"})

...then the above interaction with the ADLS account results in the following error:

...则上述与ADLS帐户的交互会导致以下错误：

Py4JJavaError: An error occurred while calling o1442.load.
: Failure to initialize configuration for storage account <storage-account>.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:52)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:666)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:2055)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:267)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:225)
    at com.databricks.common.filesystem.LokiFileSystem$.$anonfun$getLokiFS$1(LokiFileSystem.scala:63)
    at com.databricks.common.filesystem.Cache.getOrCompute(Cache.scala:38)
    at com.databricks.common.filesystem.LokiFileSystem$.getLokiFS(LokiFileSystem.scala:60)
    at com.databricks.common.filesystem.LokiFileSystem.initialize(LokiFileSystem.scala:86)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
    at com.databricks.sql.transaction.tahoe.DeltaValidation$.validateDeltaRead(DeltaValidation.scala:102)
    at org.apache.spark.sql.DataFrameReader.preprocessDeltaLoading(DataFrameReader.scala:280)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:329)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:306)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
    at java.lang.Thread.run(Thread.java:750)
Caused by: Invalid configuration value detected for fs.azure.account.key
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.ConfigurationBasicValidator.validate(ConfigurationBasicValidator.java:49)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.Base64StringConfigurationBasicValidator.validate(Base64StringConfigurationBasicValidator.java:40)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.validateStorageAccountKey(SimpleKeyProvider.java:71)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:49)

I can certainly use %run, but I am really perplexed as to why the behavior is different when using dbutils.notebook.run() and would like to understand what I might be missing.

我当然可以使用%run，但我真的很困惑为什么在使用dbutils.note book.run()时行为会不同，我想知道我可能遗漏了什么。

更多回答

优秀答案推荐

The main difference between %run and dbutils.notebook.run is that the former is like #include in C/C++ - it includes all definitions from the referenced notebook into the current execution context so it's available for your caller notebook. While the latter is executing a given notebook as a separate job, and changes made there aren't propagated to the current execution context.

%run和dbutils.note book.run之间的主要区别是，前者类似于C/C++中的#Include-它将引用的笔记本中的所有定义包含到当前执行上下文中，因此它可用于调用者笔记本。而后者将给定的笔记本作为单独的作业执行，并且在那里所做的更改不会传播到当前执行上下文。

P.S. It's really described in the documentation.

P.S.它在文档中有详细描述。

更多回答

Thanks for this. I don’t know why this distinction wasn’t clearer to me when I read the documentation, but you’ve helped reiterate what I was missing.

谢谢你我不知道为什么当我阅读文档时，这种区别对我来说并不清楚，但是你帮助我重申了我所遗漏的。

databricks - Databricks 中的目录
我已经开始阅读 Databricks 推出的 Unity Catalog。我了解它试图解决的基本问题，但我不了解目录到底是什么。这在 Databricks 文档中可用， A catalog cont
databricks - 为什么我不使用 Databricks 作为我的数据集市？
我正在努力了解 Databricks。我发现文档逐步从 S3 或 Azure Datalake 导入数据，然后输出到 Azure Synapse Analytics 或其他数据仓库解决方案。快速播
databricks - 以编程方式将库导入到 Databricks 中的工作区
我想以编程方式将(Python Wheel)库添加到 /Shared Databricks 上的工作区。在 GUI(工作区 > 导入 > 库)中很容易做到，但我无法弄清楚如何在 Databricks
databricks - 在 Databricks 笔记本错误中显示图像
我正在创建一个带有公司 Logo 的 databricks 笔记本模板。使用以下代码显示图像会引发错误。代码: %md 错误: HTTP ERROR 403: Invalid or missing
databricks - 如何在现有数据库之上创建具有只读访问权限的 databricks 数据库
我将使用这张图片来形象化我的问题: Databricks1 在 Databricks 中创建数据库(和表)并将其数据存储在存储帐户中。在Databricks2中我想读取数据:Databricks2只有
databricks - Azure Databricks secret 范围 : Azure Key Vault-backed or Databricks-backed
有没有办法通过 python 笔记本确定现有的 Azure Databricks Secret Scope 是否由 Key Vault 或 Databricks 支持？ dbutils.secrets
databricks - 无法将 dbt 连接到 Databricks
我正在尝试连接到 Databricks 上的 Spark 集群，并且正在学习本教程:https://docs.databricks.com/dev-tools/dbt.html .我安装了 dbt-d
databricks - 从 Databricks Autoloader 获取已加载文件的列表
我们可以使用Autoloader跟踪是否已从 S3 存储桶加载的文件。我关于 Autoloader 的问题:有没有办法读取 Autoloader 数据库以获取已加载文件的列表？我可以在 AWS Gl
databricks - 如何将日志从 Azure Databricks 重定向到另一个目的地？
我们可以使用一些帮助来了解如何将 Spark Driver 和 worker 日志发送到 Azure Databricks 之外的目的地，例如Azure Blob 存储或使用 Eleastic-bea
databricks - 如何启用 Databricks Delta 功能
将我的 Azure Databricks 从标准升级到主要，尝试开始使用 Databricks Delta: create table t using delta as select * from t
databricks - 我们可以从 Databricks Autoloader 中排除或仅包含特定的文件扩展名吗？
现在，databricks 自动加载器需要一个目录路径，从中加载所有文件。但是，如果其他类型的日志文件也开始进入该目录 - 有没有办法让 Autoloader 在准备数据帧时排除这些文件？ df =
databricks - 如何使用 Databricks dbutils 从文件夹中删除所有文件
有人可以让我知道如何使用 databricks dbutils 从文件夹中删除所有文件。我尝试了以下但不幸的是，Databricks 不支持通配符。 dbutils.fs.rm('adl://azu
Azure Databricks - 解释 databricks 中的安装语法
我是 azure 的新手和databricks ，我学会了如何安装 blob 和利用，但我有一些疑问，而且我还没有找到任何文档的任何答案。所以请帮我解释一下: dbutils.fs.mount(
azure - Databricks FileInfo : java. lang.ClassCastException : com. databricks.backend.daemon.dbutils.FileInfo 无法转换为 com.databricks.service.FileInfo
尝试遍历已安装的 Databricks 卷中的目录时遇到 ClassCastException。 java.lang.ClassCastException: com.databricks.backen
azure - Databricks FileInfo : java. lang.ClassCastException : com. databricks.backend.daemon.dbutils.FileInfo 无法转换为 com.databricks.service.FileInfo
尝试遍历已安装的 Databricks 卷中的目录时遇到 ClassCastException。 java.lang.ClassCastException: com.databricks.backen
databricks - 如何从 Databricks mnt 目录中删除文件夹/文件
我正在运行 Databricks Community Edition，我想从以下 mnt 目录中删除文件 /mnt/driver-daemon/jars 我运行 dbutils 命令: dbutils
databricks - 如何使用 .netrc 文件验证 Databricks API
我已经在我的机器上创建了“.netrc”文件并尝试在 databricks rest api 调用下面。但它总是给出未经授权的错误。如何在 Databricks 中创建 .netrc 文件？ curl
azure-databricks - 有没有办法恢复 Azure Databricks 中已删除的数据？
没有意识到 shift+enter 运行一个单元格。我正在写一个 delete from table 并按下 shift enter 删除了表中的所有数据。最佳答案在 Delta Lake 表中，
azure-databricks - Databricks 和 Azure 文件
我需要访问 Azure Files来自 Azure Databricks .根据文档 Azure Blobs受支持，但我需要此代码来处理 Azure 文件: dbutils.fs.mount( s
azure-databricks - 使用服务主体从 DataBricks 连接到 Synapse
我正在尝试使用服务主体从 Databricks 连接到 Synapse。我已经在集群配置中配置了服务主体 fs.azure.account.auth.type..dfs.core.windows.n

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Inconsistency between %run and dbutils.notebook.run when calling a Databricks notebook with parameters(使用参数调用Databricks笔记本时，%run和dbutils.note book.run之间的不一致)