gpt4 book ai didi

java - 从Hadoop分布式缓存读取文件时出现FileNotFoundExcepton

转载 作者:行者123 更新时间:2023-12-02 21:44:14 25 4
gpt4 key购买 nike

我在运行Hadoop作业时遇到问题,尝试从分布式缓存中检索文件时收到FileNotFoundException,即使该文件存在。当我在本地文件系统上运行它时,它可以工作。

该集群使用Hadoop 1.0.4版和Java 1.7版托管在Amazon Web Services上。我对群集或群集的设置没有任何控制权。

在主要功能中,我将文件添加到分布式缓存中。这似乎工作正常。我认为,至少它没有引发任何异常。

....
JobConf conf = new JobConf(Driver.class);
conf.setJobName("mean");
conf.set("lookupfile", args[2]);
Job job = new Job(conf);
DistributedCache.addCacheFile(new Path(args[2]).toUri(), conf);
...

在Map之前调用的Setup函数中,我为文件创建一个Path,然后调用一个将文件加载到hash map中的函数。
Configuration conf = context.getConfiguration();
String inputPath = conf.get("lookupfile");
Path dataFile = new Path(inputPath);
loadHashMap(dataFile, context);

异常在加载哈希图的函数的第一行上发生。
brReader = new BufferedReader(new FileReader(filePath.toString()));

我像这样开始工作。
hadoop jar Driver.jar Driver /tmp/input output /tmp/DATA.csv

我收到以下错误
Error: Found class org.apache.hadoop.mapreduce.Counter, but interface was expected
attempt_201410300715_0018_m_000000_0: java.io.FileNotFoundException: /tmp/DATA.csv (No such file or directory)
attempt_201410300715_0018_m_000000_0: at java.io.FileInputStream.open(Native Method)
attempt_201410300715_0018_m_000000_0: at java.io.FileInputStream.<init>(FileInputStream.java:146)
attempt_201410300715_0018_m_000000_0: at java.io.FileInputStream.<init>(FileInputStream.java:101)
attempt_201410300715_0018_m_000000_0: at java.io.FileReader.<init>(FileReader.java:58)
attempt_201410300715_0018_m_000000_0: at Map.loadHashMap(Map.java:49)
attempt_201410300715_0018_m_000000_0: at Map.setup(Map.java:98)
attempt_201410300715_0018_m_000000_0: at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
attempt_201410300715_0018_m_000000_0: at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:771)
attempt_201410300715_0018_m_000000_0: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
attempt_201410300715_0018_m_000000_0: at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
attempt_201410300715_0018_m_000000_0: at java.security.AccessController.doPrivileged(Native Method)
attempt_201410300715_0018_m_000000_0: at javax.security.auth.Subject.doAs(Subject.java:415)
attempt_201410300715_0018_m_000000_0: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1140)
attempt_201410300715_0018_m_000000_0: at org.apache.hadoop.mapred.Child.main(Child.java:253)
14/11/01 02:12:49 INFO mapred.JobClient: Task Id : attempt_201410300715_0018_m_000001_0, Status : FAILED

我已验证该文件在HDFS和本地文件系统中均存在。
hadoop@hostname:~$ hadoop fs -ls /tmp
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2014-10-30 11:19 /tmp/input
-rw-r--r-- 1 hadoop supergroup 428796 2014-10-30 11:19 /tmp/DATA.csv

hadoop@hostname:~$ ls -al /tmp/
-rw-r--r-- 1 hadoop hadoop 428796 Oct 30 11:30 DATA.csv

老实说,我看不出这是怎么回事。异常列出了文件的正确路径。我已验证该文件同时存在于HDFS和本地文件系统上。我在这里缺少什么吗?

最佳答案

BufferedReader的输入应来自Setup()中DistributedCache.getLocalCacheFiles()返回的路径。更像..

Path[] localFiles = DistributedCache.getLocalCacheFiles();
if (localFiles.length > 0){
brReader = new BufferedReader(new FileReader(localFiles[0].toString());
}

关于java - 从Hadoop分布式缓存读取文件时出现FileNotFoundExcepton,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26685752/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com