gpt4 book ai didi

java - Java:从FTP下载.Zip文件并提取内容而不将文件保存在本地系统上

转载 作者:行者123 更新时间:2023-12-02 21:26:45 26 4
gpt4 key购买 nike

我有一个要求,我需要从FTP服务器下载某些.Zip文件,并将存档的内容(内容是一些XML文件)推送到 HDFS(Hadoop分布式文件系统)。因此,到目前为止,我正在使用 acpache FTPClient 连接到FTP服务器,然后首先将文件下载到本地计算机。稍后将其解压缩并给出方法的文件夹路径,该方法将迭代本地文件夹并将文件推送到HDFS。为了便于理解,我还在下面附加了一些代码片段。

 //Gives me an active FTPClient
FTPClient ftpCilent = getActiveFTPConnection();
ftpCilent.changeWorkingDirectory(remoteDirectory);

FTPFile[] ftpFiles = ftpCilent.listFiles();
if(ftpFiles.length <= 0){
logger.info("Unable to find any files in given location!!");
return;
}
//Iterate files
for(FTPFile eachFTPFile : ftpFiles){
String ftpFileName = eachFTPFile.getName();

//Skips files if not .zip files
if(!ftpFileName.endsWith(".zip")){
continue;
}

System.out.println("Reading File -->" + ftpFileName);
/*
* location is the path on local system given by user
* usually loaded by a property file.
*
* Create a archiveLocation where archived files are
* downloaded from FTP.
*/
String archiveFileLocation = location + File.separator + ftpFileName;
String localDirName = ftpFileName.replaceAll(".zip", "");
/*
* localDirLocation is the location where a folder is created
* by the name of the archive in the FTP and the files are copied to
* respective folders.
*
*/
String localDirLocation = location + File.separator + localDirName;
File localDir = new File(localDirLocation);
localDir.mkdir();

File archiveFile = new File(archiveFileLocation);

FileOutputStream archiveFileOutputStream = new FileOutputStream(archiveFile);

ftpCilent.retrieveFile(ftpFileName, archiveFileOutputStream);
archiveFileOutputStream.close();

//Delete the archive file after coping it's contents
FileUtils.forceDeleteOnExit(archiveFile);

//Read the archive file from archiveFileLocation.
ZipFile zip = new ZipFile(archiveFileLocation);
Enumeration entries = zip.entries();

while(entries.hasMoreElements()){
ZipEntry entry = (ZipEntry)entries.nextElement();

if(entry.isDirectory()){
logger.info("Extracting directory " + entry.getName());
(new File(entry.getName())).mkdir();
continue;
}

logger.info("Extracting File: " + entry.getName());
IOUtils.copy(zip.getInputStream(entry), new FileOutputStream(
localDir.getAbsolutePath() + File.separator + entry.getName()));
}

zip.close();
/*
* Iterates the folder location provided and load the files to HDFS
*/
loadFilesToHDFS(localDirLocation);
}
disconnectFTP();

现在,这种方法的问题在于,该应用程序花费大量时间将文件下载到本地路径,将其解压缩,然后将其加载到HDFS。有没有一种更好的方法可以即时从FTP 的FTP 中提取Zip的内容,并将内容流直接提供给loadFilesToHDFS()方法,而不是提供给本地系统的路径?

最佳答案

使用压缩流。
看这里:
http://www.oracle.com/technetwork/articles/java/compress-1565076.html

具体请参见代码示例1。

关于java - Java:从FTP下载.Zip文件并提取内容而不将文件保存在本地系统上,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35694133/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com