gpt4 book ai didi

hadoop - 从Hadoop文件系统中的分布式缓存读取时出现IO异常?

转载 作者:行者123 更新时间:2023-12-02 21:59:28 26 4
gpt4 key购买 nike

我遵循tutorial here使用分布式缓存。我对代码进行了少许更改,以使其与Hadoop2.2兼容。

我发现,当调用loadStopWords方法时,会引发IO异常:

我确认将stop_words.txt复制到HDFS
我省略了映射器和化简器代码,以使其在这里变得简单。

这是我的代码:

public static final String LOCAL_STOPWORD_LIST =
"/Users/sridhar/Documents/hadoop/stop_words.txt";

public static final String HDFS_STOPWORD_LIST = "/data/stop_words.txt";

//copies local file to HDFS and adds to Job's cache file
static void cacheStopWordList(Configuration conf, Job job) throws IOException, URISyntaxException {
FileSystem fs = FileSystem.get(conf);
URI hdfsPath = new URI(HDFS_STOPWORD_LIST);

System.out.println("coping files to HDFS");

// upload the file to hdfs. Overwrite any existing copy.
fs.copyFromLocalFile(false, true, new Path(LOCAL_STOPWORD_LIST),
new Path(hdfsPath));

System.out.println("done copying HDFS");
job.addCacheFile(hdfsPath);
}

protected void setup(Context context) {
try {
String stopwordCacheName = new Path(HDFS_STOPWORD_LIST).toString();
URI[] cacheFiles = context.getCacheFiles();

System.out.println(Arrays.toString(cacheFiles));


if (null != cacheFiles && cacheFiles.length > 0) {
for (URI cacheURI : cacheFiles) {
System.out.println(cacheURI.toString());
System.out.println(stopwordCacheName);
System.out.println("-----------------");
if (cacheURI.toString().equals(stopwordCacheName)) {
System.out.println("****************************************");
loadStopWords(new Path(cacheURI)); // IT BREAKS HERE
System.out.println(stopWords);
break;
}
}
}
} catch (IOException ioe) {
System.err.println("IOException reading from distributed cache");
System.err.println(ioe.toString());
}
}

void loadStopWords(Path cachePath) throws IOException {
// note use of regular java.io methods here - this is a local file now
BufferedReader wordReader = new BufferedReader(
new FileReader(cachePath.toString()));
try {
String line;
this.stopWords = new HashSet<String>();
while ((line = wordReader.readLine()) != null) {
this.stopWords.add(line.toLowerCase());
}
} finally {
wordReader.close();
}
}





public static void main(String[] args) throws IllegalArgumentException, IOException, InterruptedException, ClassNotFoundException, URISyntaxException {

Job job = new Job();
job.setJarByClass(LineIndexer.class);
job.setJobName("LineIndexer");
Configuration conf = job.getConfiguration();
cacheStopWordList(conf,job);
}

最佳答案

我认为您应该尝试使用Path[] localPaths = context.getLocalCacheFiles();而不是context.getCacheFiles();让我知道它是否有效

关于hadoop - 从Hadoop文件系统中的分布式缓存读取时出现IO异常?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24621083/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com