gpt4 book ai didi

azure - Spark : One of the request inputs is not valid 在 azure 中的 Spark Dataframe 写入问题

转载 作者:行者123 更新时间:2023-12-03 06:52:23 25 4
gpt4 key购买 nike

我能够从 azure blob 存储读取数据,但是当写回 azure 存储时,它会抛出以下错误。我正在本地计算机上运行该程序。有人可以帮我解决这个问题吗?

我的计划

val conf = new SparkConf()val config = new SparkConf();

val spark = SparkSession.builder().appName("AzureConnector ").config(config).master("local[*]").getOrCreate()

try {
spark.sparkContext.hadoopConfiguration.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")

spark.sparkContext.hadoopConfiguration.set("fs.wasbs.impl", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
spark.sparkContext.hadoopConfiguration.set("fs.azure.account.key.**myaccount**.blob.core.windows.net",
"**mykey**")


val csvDf = spark.read.csv("wasbs://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="91e6fee3fae2e1f0f2f4e2d1fce8f0f2f2fee4ffe5bff3fdfef3bff2fee3f4bfe6f8fff5fee6e2bffff4e5" rel="noreferrer noopener nofollow">[email protected]</a>/test/test.csv")
csvDf.show()
csvDf.coalesce(1).write.format("csv").mode("append").save("wasbs://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c6b1a9b4adb5b6a7a5a3b586abbfa7a5a5a9b3a8b2e8a4aaa9a4e8a5a9b4a3e8b1afa8a2a9b1b5e8a8a3b2" rel="noreferrer noopener nofollow">[email protected]</a>/test/output")

} catch {
case e: Exception => {
e.printStackTrace()
}
}

错误

org.apache.hadoop.fs.azure.AzureException:com.microsoft.azure.storage.StorageException: One of the requestinputs is not valid. atorg.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2482)atorg.apache.hadoop.fs.azure.NativeAzureFileSystem$FolderRenamePending.execute(NativeAzureFileSystem.java:424)atorg.apache.hadoop.fs.azure.NativeAzureFileSystem.rename(NativeAzureFileSystem.java:1997)atorg.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:435)atorg.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:415)atorg.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)atorg.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)atorg.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:153)atorg.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:260)atorg.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)atorg.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)atorg.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)atorg.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191)atorg.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)at org.apache.spark.scheduler.Task.run(Task.scala:108) atorg.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748) Caused by:com.microsoft.azure.storage.StorageException: One of the requestinputs is not valid. atcom.microsoft.azure.storage.StorageException.translateException(StorageException.java:162)atcom.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307)atcom.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:177)atcom.microsoft.azure.storage.blob.CloudBlob.startCopyFromBlob(CloudBlob.java:764)atorg.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobWrapperImpl.startCopyFromBlob(StorageInterfaceImpl.java:399)atorg.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2449)... 19 more

最佳答案

不支持使用 WASB 驱动程序作为启用分层命名空间的存储帐户的客户端。相反,我们建议您在 Hadoop 环境中使用 Azure Blob 文件系统 (ABFS) 驱动程序 Source

检查您的存储帐户是否为 adls gen2(启用了分层命名空间)

关于azure - Spark : One of the request inputs is not valid 在 azure 中的 Spark Dataframe 写入问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73469170/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com