gpt4 book ai didi

apache-spark - 从 SparkSession.read() 获取 "org.apache.spark.sql.AnalysisException: Path does not exist"

转载 作者:行者123 更新时间:2023-12-04 15:51:54 26 4
gpt4 key购买 nike

<分区>

我正在尝试在客户端模式下读取 spark-submit 提交给 yarn 集群的文件。将文件放入 HDFS 不是一种选择。这是我所做的:

def main(args: Array[String]) {
if (args != null && args.length > 0) {
val inputfile: String = args(0)

//get filename: train.csv
val input_filename = inputfile.split("/").toList.last

val d = SparkSession.read
.option("header", "true")
.option("inferSchema", "true")
.csv(SparkFiles.get(input_filename))
d.show()
}
}

并以这种方式提交给 yarn :

spark2-submit \
--class "com.example.HelloWorld" \
--master yarn --deploy-mode client \
--files repo/data/train.csv \
--driver-cores 2 helloworld-assembly-0.1.jar repo/data/train.csv

但我有一个异常(exception):

Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://xxxxx.xxxxx.xxxx.com:8020/tmp/spark-db3ee991-7f3d-427c-8479-aa212f906dc5/userFiles-040293ee-0d1f-44dd-ad22-ef6fe729bd49/train.csv; 

我也试过:

val input_filename_1 = """file://""" + SparkFiles.get(input_filename)
println(input_filename_1)

SparkSession.read
.option("header", "true")
.option("inferSchema", "true")
.csv(input_filename_1)

仍然有类似的错误:

 file:///tmp/spark-fbd46e9d-c450-4f86-8b23-531e239d7b98/userFiles-8d129eb3-7edc-479d-aeda-2da98432fc50/train.csv
Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: file:/tmp/spark-fbd46e9d-c450-4f86-8b23-531e239d7b98/userFiles-8d129eb3-7edc-479d-aeda-2da98432fc50/train.csv;

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com