gpt4 book ai didi

scala - 使用 Spark 2.4 从 Azure Data Lake Storage V2 读取文件

转载 作者:行者123 更新时间:2023-12-03 00:59:22 25 4
gpt4 key购买 nike

我正在尝试在 Mac 上的 IntelliJ-IDE 上使用 Spark 2.4 读取简单的 csv 文件 Azure Data Lake Storage V2

下面的代码

package com.example

import org.apache.spark.SparkConf
import org.apache.spark.sql._



object Test extends App {

val appName: String = "DataExtract"
val master: String = "local[*]"
val sparkConf: SparkConf = new SparkConf()
.setAppName(appName)
.setMaster(master)
.set("spark.scheduler.mode", "FAIR")
.set("spark.sql.session.timeZone", "UTC")
.set("spark.sql.shuffle.partitions", "32")
.set("fs.defaultFS", "abfs://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8aeeeffcefe6e5fae7efe4fecaf2f3f0a4eeecf9a4e9e5f8efa4fde3e4eee5fdf9a4e4effe" rel="noreferrer noopener nofollow">[email protected]</a>/")
.set("fs.azure.account.key.xyz.dfs.core.windows.net", "~~key~~")


val spark: SparkSession = SparkSession
.builder()
.config(sparkConf)
.getOrCreate()
spark.time(run(spark))


def run(spark: SparkSession): Unit = {

val df = spark.read.csv("abfs://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d5b1b0a3b0b9baa5b8b0bba195adacaffbb1b3a6fbb6baa7b0fba2bcbbb1baa2a6fbbbb0a1" rel="noreferrer noopener nofollow">[email protected]</a>/development/sales.csv")
df.show(10)

}

}

它能够读取并抛出安全异常

Exception in thread "main" java.lang.NullPointerException
at org.wildfly.openssl.CipherSuiteConverter.toJava(CipherSuiteConverter.java:284)
at org.wildfly.openssl.OpenSSLEngine.toJavaCipherSuite(OpenSSLEngine.java:1094)
at org.wildfly.openssl.OpenSSLEngine.getEnabledCipherSuites(OpenSSLEngine.java:729)
at org.wildfly.openssl.OpenSSLContextSPI.getCiphers(OpenSSLContextSPI.java:333)
at org.wildfly.openssl.OpenSSLContextSPI$1.getSupportedCipherSuites(OpenSSLContextSPI.java:365)
at org.apache.hadoop.fs.azurebfs.utils.SSLSocketFactoryEx.<init>(SSLSocketFactoryEx.java:105)
at org.apache.hadoop.fs.azurebfs.utils.SSLSocketFactoryEx.initializeDefaultFactory(SSLSocketFactoryEx.java:72)
at org.apache.hadoop.fs.azurebfs.services.AbfsClient.<init>(AbfsClient.java:79)
at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:817)
at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:149)
at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:108)

谁能帮帮我,哪里出了问题?

最佳答案

根据我的研究,当您的 jar 包与 hadoop 版本不兼容时,您将收到此错误消息。

我请求您解决以下问题:

http://mail-archives.apache.org/mod_mbox/spark-issues/201907.mbox/%3CJI[email protected]%3E

https://issues.apache.org/jira/browse/HADOOP-16410

关于scala - 使用 Spark 2.4 从 Azure Data Lake Storage V2 读取文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63195365/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com