gpt4 book ai didi

java - hadoop wordcount 与 java

转载 作者:可可西里 更新时间:2023-11-01 16:24:47 25 4
gpt4 key购买 nike

大家好,我是 Hadoop 的新手。这是我的第一个程序,我需要帮助解决以下错误。

当我不使用 hdfs://localhost:9000/直接将我的文件放入 HDFS 时,我收到错误消息 dir not exist

所以我通过以下方式将文件放入hdfs

hadoop fs -put file.txt  hdfs://localhost:9000/sawai.txt

在这个文件像这样加载到 HDFS 之后:

<code>File added successfully</code>

  1. 好的,然后我尝试像这样运行我的 wordcount jar 文件程序:

    hadoop jar wordcount.jar hdp.WordCount sawai.txt 输出路径

    我收到以下错误消息:

    org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/hadoop_usr/sawai.txt
  2. 然后我尝试另一种方式,我尝试像这样指定 hdfs 路径。

    hadoop jar wordcount.jar hdp.WordCount hdfs://localhost:9000/sawai.txt hdfs://localhost:9000/outputdir

    我收到以下错误消息:

    org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/sawai.txt already exists
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870)
    at hdp.WordCount.run(WordCount.java:40)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at hdp.WordCount.main(WordCount.java:17)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

我读了很多文章,他们建议我每次都更改输出目录名称,我采用了这种方式,但它在我的情况下不起作用,而且似乎问题在于定义我们要对其执行操作的源文件名。

是什么导致了异常,我该如何解决?

最佳答案

我还没有看到你的带有输入/输出的完整程序....

我认为 sawai.txt 是您要计算字数的输入文件。为什么要将其复制到输出?

但是,请参阅此示例将其添加到驱动程序。如果路径存在则删除。所以你不会得到 FileAlreadyExistsException

   /*Provides access to configuration parameters*/
Configuration conf = new Configuration();
/*Creating Filesystem object with the configuration*/
FileSystem fs = FileSystem.get(conf);
/*Check if output path (args[1])exist or not*/
if(fs.exists(new Path(args[1]))){
/*If exist delete the output path*/
fs.delete(new Path(args[1]),true);
}

关于java - hadoop wordcount 与 java,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41268932/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com