gpt4 book ai didi

java - 使用来自 Google Storage Bucket 的输入运行 Spark Job 时找不到文件

转载 作者:行者123 更新时间:2023-12-01 13:47:25 25 4
gpt4 key购买 nike

我正在 Google Cloud Dataproc 集群上运行一项作业,该集群采用一个参数——输入文件的路径。此文件存储在 Google Cloud Storage 存储分区中。我得到一个 FileNotFoundException(跟踪如下)。为什么会这样?

gcloud dataproc jobs submit spark --cluster cluster-1 --class MST.ComputeMST \
--jars gs://dataproc-211700eb-83ed-456d-a67e-98af9e6fa02d-us/ComputeMST.jar \
-- gs:///dataproc-211700eb-83ed-456d-a67e-98af9e6fa02d-us/input.txt

Job [8b193fcd-1350-462b-ae11-373333e868fe] submitted.
Waiting for job output...
17/05/16 05:06:02 INFO com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase: GHFS version: 1.6.1-hadoop2
number of runs = 0
Exception in thread "main" java.io.FileNotFoundException: gs:/dataproc-211700eb-83ed-456d-a67e-98af9e6fa02d-us/input.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at java.io.FileReader.<init>(FileReader.java:58)
at MST.ComputeMST.main(ComputeMST.java:670)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ERROR: (gcloud.dataproc.jobs.submit.spark) Job [8b193fcd-1350-462b-ae11-373333e868fe] entered state [ERROR] while waiting for [DONE].

最佳答案

即使 GCS 连接器默认安装在 Cloud Dataproc 集群上,您也无法通过 java.io.FileReader 在您的作业中使用它。界面。

要通过 GCS 连接器访问 GCS 对象,您需要 use Hadoop的FileSystem界面。

关于java - 使用来自 Google Storage Bucket 的输入运行 Spark Job 时找不到文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43993097/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com