gpt4 book ai didi

hadoop - Nutch : org. apache.hadoop.mapreduce.lib.input.InvalidInputException: 输入路径不存在

转载 作者:可可西里 更新时间:2023-11-01 16:17:12 28 4
gpt4 key购买 nike

当我执行 nutch 命令创建 crawdb 文件夹和内容时:

soporte@CNEOSYLAP /usr/local/apache-nutch-2.2.1/runtime/local
$ bin/nutch crawl urls -dir crawl -depth 3 -topN 5

我收到这个错误:

InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/C:/cygwin/usr/local/apache-nutch-2.2.1/runtime/local/crawl
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)

我正在使用 apache-nutch-2.2.1、hadoop-0.20.2-core.jar、hbase-0.90.4.jar 和 CygWin setup 2.774。

我没有安装 hadoop,nutch 安装中只有 hadoop 库,因此不是分布式而是本地 nutch 安装。

有什么想法吗?提前致谢!

编辑:

手动创建目录时,出现另一个错误:

soporte@CNEOSYLAP /usr/local/apache-nutch-2.2.1/runtime/local
$ mkdir crawl

soporte@CNEOSYLAP /usr/local/apache-nutch-2.2.1/runtime/local
$ chmod 777 crawl

soporte@CNEOSYLAP /usr/local/apache-nutch-2.2.1/runtime/local
$ bin/nutch crawl urls -dir crawl -depth 3 -topN 5
cygpath: can't convert empty path
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
Exception in thread "main" java.lang.RuntimeException: job failed: name=inject crawl, jobid=null
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)

最佳答案

如果要使用-dir crawl,需要创建文件夹file:/C:/cygwin/usr/local/apache-nutch-2.2.1/runtime/local/crawl 首先。

关于hadoop - Nutch : org. apache.hadoop.mapreduce.lib.input.InvalidInputException: 输入路径不存在,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17959597/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com