gpt4 book ai didi

hadoop - 星火-Hadoop-> org.apache.hadoop.mapred.InvalidInputException : Input path does not exist

转载 作者:可可西里 更新时间:2023-11-01 14:16:38 26 4
gpt4 key购买 nike

我在尝试将文件从 hdfs 读取到 Spark 时遇到错误。文件 README.md 存在于 hdfs 中

 spark@osboxes hadoop]$ hdfs dfs -ls README.md
16/02/26 00:29:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r-- 1 spark supergroup 4811 2016-02-25 23:38 README.md

在 Spark shell 中,我给了

scala> val readme = sc.textFile("hdfs://localhost:9000/README.md")
readme: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:27

scala> readme.count
16/02/26 00:25:26 DEBUG BlockManager: Getting local block broadcast_4
16/02/26 00:25:26 DEBUG BlockManager: Level for block broadcast_4 is StorageLevel(true, true, false, true, 1)
16/02/26 00:25:26 DEBUG BlockManager: Getting block broadcast_4 from memory
16/02/26 00:25:26 DEBUG HadoopRDD: Creating new JobConf and caching it for later re-use
16/02/26 00:25:26 DEBUG Client: The ping interval is 60000 ms.
16/02/26 00:25:26 DEBUG Client: Connecting to localhost/127.0.0.1:9000
16/02/26 00:25:26 DEBUG Client: IPC Client (648679508) connection to localhost/127.0.0.1:9000 from spark: starting, having connections 1
16/02/26 00:25:26 DEBUG Client: IPC Client (648679508) connection to localhost/127.0.0.1:9000 from spark sending #4
16/02/26 00:25:26 DEBUG Client: IPC Client (648679508) connection to localhost/127.0.0.1:9000 from spark got value #4
16/02/26 00:25:26 DEBUG ProtobufRpcEngine: Call: getFileInfo took 6ms
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/README.md
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD.count(RDD.scala:1143)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:41)
at $iwC$$iwC$$iwC.<init>(<console>:43)
at $iwC$$iwC.<init>(<console>:45)
at $iwC.<init>(<console>:47)
at <init>(<console>:49)
at .<init>(<console>:53)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


scala> 16/02/26 00:25:36 DEBUG Client: IPC Client (648679508) connection to localhost/127.0.0.1:9000 from spark: closed
16/02/26 00:25:36 DEBUG Client: IPC Client (648679508) connection to localhost/127.0.0.1:9000 from spark: stopped, remaining connections 0

在 core-site.xml 中,我有以下条目:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>

和 hdfs-site.xml 具有以下详细信息:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

我在这里遗漏了什么吗?我的操作系统是 CentOS Linux release 7.2.1511 (Core),Hadoop 是 2.7.2,Spark 是 1.6.0-bin-hadoop2.6

最佳答案

这是由于目录之间的内部映射而发生的。首先转到保存文件 (README.md) 的目录。运行命令:df -k .。您将获得目录的实际挂载点。例如:/xyz现在,尝试在此挂载点中找到您的文件 (README.md)。例如:/xyz/home/omi/myDir/README.md在您的代码中使用此路径。val readme = sc.textfile("/xyz/home/omi/myDir/README.md");

关于hadoop - 星火-Hadoop-> org.apache.hadoop.mapred.InvalidInputException : Input path does not exist,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35641284/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com