gpt4 book ai didi

python - Databricks dbutils 不显示特定文件夹下的文件夹列表

转载 作者:行者123 更新时间:2023-12-05 04:39:08 25 4
gpt4 key购买 nike

我在一个容器下有三个文件夹

文件夹结构

 folder1
|_ file1.json
|_ file2.json
folder2
|_ sub-folder1
|_ file1.json
|_ sub_folder2
|_ sub-folder01
|_ file2.json
folder3
|_ sub-folder1
|_ file1.json

注意:folder2 只有文件夹列表,其中可能有文件,我正在尝试迭代并在 python 代码中查找特定文件名。

from pyspark.sql.functions import col,lit
from datetime import datetime

app_storage_acct_name= 'mystorageaccnt1'
app_storage_acct_scope="{}-scope".format(app_storage_acct_name)

config_secret_set_url = "fs.azure.account.key.{}.blob.core.windows.net".format(app_storage_acct_name)
secret = dbutils.secrets.get(scope = app_storage_acct_scope, key = app_storage_acct_key)
dbutils.fs.mount(
source = "wasbs://mycontainer1@mystirageaccnt1.blob.core.windows.net",
mount_point = "/mnt/my-data-src",
extra_configs = {config_secret_set_url:dbutils.secrets.get(scope = app_storage_acct_scope, key = app_storage_acct_key)})

dbutils.fs.ls('/mnt/my-data-src/')

上面的代码打印了我在 blob 存储资源管理器中也看到的三个文件夹

Out[29]: [FileInfo(path='dbfs:/mnt/my-data-src/folder1/', name='folder1/', size=0),
FileInfo(path='dbfs:/mnt/my-data-src/folder2/', name='folder2/', size=0),
FileInfo(path='dbfs:/mnt/my-data-src/folder3/', name='folder3/', size=0)]

当我在下面使用时,会列出文件

dbutils.fs.ls('/mnt/my-data-src/folder1/')
  • 输出如下
Out[30]: [FileInfo(path='dbfs:/mnt/my-data-src/folder1/file1.json', name='file1.json', size=1011),
FileInfo(path='dbfs:/mnt/my-data-src....,

当我尝试使用以下命令列出 folder2 下的文件夹时

dbutils.fs.ls('/mnt/my-data-src/folder2/')
  • 输出 java.io.FileNotFoundException: 文件/folder2 不存在。
ExecutionError                            Traceback (most recent call last)
<command-2660727172978602> in <module>
----> 1 dbutils.fs.ls('/mnt/my-data-src/folder2/')

/databricks/python_shell/dbruntime/dbutils.py in f_with_exception_handling(*args, **kwargs)
317 exc.__context__ = None
318 exc.__cause__ = None
--> 319 raise exc
320
321 return f_with_exception_handling

ExecutionError: An error occurred while calling z:com.databricks.backend.daemon.dbutils.FSUtils.ls.
: java.io.FileNotFoundException: File /folder2 does not exist.
at shaded.databricks.org.apache.hadoop.fs.azure.NativeAzureFileSystem.listStatus(NativeAzureFileSystem.java:2468)
at com.databricks.backend.daemon.data.client.DBFSV2.$anonfun$listStatus$2(DatabricksFileSystemV2.scala:95)
at com.databricks.s3a.S3AExceptionUtils$.convertAWSExceptionToJavaIOException(DatabricksStreamUtils.scala:66)
at com.databricks.backend.daemon.data.client.DBFSV2.$anonfun$listStatus$1(DatabricksFileSystemV2.scala:92)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:395)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:484)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:504)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:266)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:261)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:258)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionContext(DatabricksFileSystemV2.scala:510)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:305)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:297)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionTags(DatabricksFileSystemV2.scala:510)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:479)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:404)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperationWithResultTags(DatabricksFileSystemV2.scala:510)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:395)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:367)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperation(DatabricksFileSystemV2.scala:510)
at com.databricks.backend.daemon.data.client.DBFSV2.listStatus(DatabricksFileSystemV2.scala:92)
at com.databricks.backend.daemon.data.client.DatabricksFileSystem.listStatus(DatabricksFileSystem.scala:150)
at com.databricks.backend.daemon.dbutils.FSUtils$.$anonfun$ls$1(DBUtilsCore.scala:154)
at com.databricks.backend.daemon.dbutils.FSUtils$.withFsSafetyCheck(DBUtilsCore.scala:91)
at com.databricks.backend.daemon.dbutils.FSUtils$.ls(DBUtilsCore.scala:153)
at com.databricks.backend.daemon.dbutils.FSUtils.ls(DBUtilsCore.scala)
at sun.reflect.GeneratedMethodAccessor223.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)

dbutils.fs.ls() 没有列出在这种情况下有文件夹的文件夹的任何具体原因?

回答:我试图直接访问一个文件并注意到它是 blob 类型 Append Blobdbutils.fs.ls('/mnt/my-data-src/folder2/file.json) 报告以下消息。

shaded.databricks.org.apache.hadoop.fs.azure.AzureException: hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: Incorrect Blob type, please use the correct Blob type to access a blob on the server. Expected BLOCK_BLOB, actual APPEND_BLOB.

有什么方法可以列出数据 block 中的 blob 类型追加?

最佳答案

Azure Databricks 确实支持使用 Hadoop API 访问附加 blob,但仅限于附加到文件时。

此问题没有解决方法。

使用 Azure CLI 或适用于 Python 的 Azure 存储 SDK 来确定目录是否包含追加 blob 或对象是否为追加 blob。

您可以使用 RDD API 实现 Spark SQL UDF 或自定义函数,以使用适用于 Python 的 Azure 存储 SDK 加载、读取或转换 blob。

有一个official documentation为这个问题给出。

关于python - Databricks dbutils 不显示特定文件夹下的文件夹列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70469975/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com