gpt4 book ai didi

Java:如何从Chunk中的ArrayList中的目录中加载所有文件并对其进行处理

转载 作者:行者123 更新时间:2023-12-02 09:57:48 24 4
gpt4 key购买 nike

我有一个应用程序,我需要从目录加载所有文件并上传到 S3 。目录中的文件数量约为 1 亿个大小为 15 GB 的小 xml 文件。

这就是我当前上传和处理它的方式。但是,当文件较少时,它可以正常工作,但是当文件较多时,我也会出现内存不足错误,并且它不起作用。

public class FileProcessThreads {

private static Logger _logger = Logger.getLogger(FileProcessThreads.class);

public ArrayList process(String fileLocation) {

_logger.info("Calling process method of FileProcessThreads class");
File dir = new File(fileLocation);
File[] directoryListing = dir.listFiles();
ArrayList<File> files = new ArrayList<File>();
if (directoryListing.length > 0) {

for (File path : directoryListing) {
String fileType = FilenameUtils.getExtension (path.getName());
long fileSize = path.length();
if (fileType.equals("gz") && fileSize>0){
files.add(path);
}

}
}
_logger.info("Exiting process method of FileProcessThreads class");
return files;
}

我认为在chuck中加载文件可能会起作用。但我该怎么办呢?此外,目录中的文件始终位于同一目录中。

我们可以增加数组的大小吗?

我也在这里调用这个类

public class UploadExecutor {
private static Logger _logger = Logger.getLogger(UploadExecutor.class);

@SuppressWarnings("unchecked")
public static void main(String[] args) {

_logger.info("----------STARTING JAVA MAIN METHOD----------------- ");

/*
* 3 C:\\Users\\u6034690\\Desktop\\TWOFILE\\xml
* a205381-tr-fr-production-us-east-1-trf-auditabilty
*/
while (true) {

String strNoOfThreads = args[0];
String strFileLocation = args[1];
String strBucketName = args[2];

int iNoOfThreads = Integer.parseInt(strNoOfThreads);
S3ClientManager s3ClientObj = new S3ClientManager();
AmazonS3Client s3Client = s3ClientObj.buildS3Client();


try {

FileProcessThreads fp = new FileProcessThreads();
List<File> records = fp.process(strFileLocation);
try {

_logger.info("No records found will wait for 10 Seconds");
TimeUnit.SECONDS.sleep(10);
records = fp.process(strFileLocation);

} catch (InterruptedException e) {
_logger.error("InterruptedException: " + e.toString());
}
_logger.info("Total no of Audit files = " + records.size());

if (records.size() >= 0) {
BuildThread BuildThreadObj = new BuildThread();
BuildThreadObj.buildThreadLogic(iNoOfThreads, s3Client, records, strFileLocation, strBucketName);
}
} catch (Throwable t) {
_logger.error("InterruptedException: " + t.toString());
}
}
}
}

感谢任何帮助。

我无法使用下面的代码,因为我需要将其作为文件列表上传到 S3 中。

Iterator<File> it = FileUtils.iterateFiles(folder, null, true);
while (it.hasNext())
{
File fileEntry = (File) it.next();
}

最佳答案

您可以从 process 方法返回 File[] 而不是 ArrayList。然后在主类中,迭代文件并批量上传。

   public class FileProcessThreads {

private static Logger _logger = Logger.getLogger(FileProcessThreads.class);

public File[] getFiles(String fileLocation) {

_logger.info("Calling process method of FileProcessThreads class");
File dir = new File(fileLocation);
File[] directoryListing = dir.listFiles();
ArrayList<File> files = new ArrayList<File>();
if (directoryListing.length > 0)
return directoryListing;
_logger.info("Exiting process method of FileProcessThreads class");
return null;

}
}

public class UploadExecutor {
private static Logger _logger = Logger.getLogger(UploadExecutor.class);

@SuppressWarnings("unchecked")
public static void main(String[] args) {

_logger.info("----------STARTING JAVA MAIN METHOD----------------- ");

/*
* 3 C:\\Users\\u6034690\\Desktop\\TWOFILE\\xml
* a205381-tr-fr-production-us-east-1-trf-auditabilty
*/
while (true) {

String strNoOfThreads = args[0];
String strFileLocation = args[1];
String strBucketName = args[2];

int iNoOfThreads = Integer.parseInt(strNoOfThreads);
S3ClientManager s3ClientObj = new S3ClientManager();
AmazonS3Client s3Client = s3ClientObj.buildS3Client();


try {

FileProcessThreads fp = new FileProcessThreads();
File[] files = fp.getFiles(strFileLocation);
try {

_logger.info("No records found will wait for 10 Seconds");
TimeUnit.SECONDS.sleep(10);
files = fp.getFiles(strFileLocation);
ArrayList<File> batchFiles = new ArrayList<File>(batchSize);
if(null!=files){
for (File path : files) {
String fileType = FilenameUtils.getExtension (path.getName());
long fileSize = path.length();
if (fileType.equals("gz") && fileSize>0){
batchFiles.add(path);
}
//wait till list size equals to batchSize
if (batchFiles.size() == batchSize) {
//upload batch to S3
BuildThread BuildThreadObj = new BuildThread();
BuildThreadObj.buildThreadLogic(iNoOfThreads, s3Client, batchFiles, strFileLocation, strBucketName);
batchFiles.clear();
}
}
}
//to consider remaining or files with count<batch size
if (! batch.isEmpty()) {
BuildThread BuildThreadObj = new BuildThread();
BuildThreadObj.buildThreadLogic(iNoOfThreads, s3Client, batchFiles, strFileLocation, strBucketName);
batchFiles.clear();
}

} catch (InterruptedException e) {
_logger.error("InterruptedException: " + e.toString());
}
_logger.info("Total no of Audit files = " + records.size());

} catch (Throwable t) {
_logger.error("InterruptedException: " + t.toString());
}
}
}
}

希望对你有帮助

关于Java:如何从Chunk中的ArrayList中的目录中加载所有文件并对其进行处理,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55863311/

24 4 0